SR-IOV

SR-IOV

SR-IOV (Single Root IO Virtualization) is a network virtualization technology. The basic idea is, that a physical device can have multiple virtual instances of itself, and these can be assigned to any LPARs running on the managed system. We have only 1 physical adapter, but each LPAR will think it has its own dedicated adapter. This is achieved by creating logical ports on top of the physical ports on the adapter. These ports will be exist in the Hypervisor (Firmware), so no VIOS is necessary. In order to see new SR-IOV menu points in HMC GUI, Firmware and HMC should be in a correct level.

Dedicated Mode - Shared Mode:
An SR-IOV capable adapter is either in dedicated mode or shared mode.
In dedicated mode, the adapter is owned by one LPAR. Physical ports of the adapter are owned by that partition. (Usual old config.)
In shared mode, adapter is owned by the Hypervisor. In this mode logical ports can be created and these can be assigned to any LPAR.

Logical Port (LP) - Virtual Function (VF)
A Virtual Function (VF) is a general term used by PCI standards. We can think of it as a slice of the physical port on the adapter. On IBM Power Systems, SR-IOV implements VFs as logical ports. A Logical Port (LP) is an I/O device created for a partition to access the Virtual Function on the adapter. When a Logical Port is created, the hypervisor will configure a virtual function on the adapter, and maps it to the Logical Port. These are in 1 to 1 relationship. (In general documentations the term VF is used many times, but during SR-IOV configuration on HMC, the term LP is used. In these standards another term, the Physical Function (PF) also exists, which we can think as a reference to the Physical Port.)

------------------------------------

Capacity (SR-IOV desired bandwidth)

During Logical Port (Virtual Function) creation the desired capacity need to be configured. This is similar to the Entitled Capacity (CPU) setting, just here % is used. The configured value is the desired minimum bandwidth in percentage. It is not capped, which means, if there is additional bandwidth that is not being used currently, it will be shared equally among all logical ports. Assignments are made in increments of 2 %  and total assignments for a single port can not exceed 100%. Capacity cannot be changed dynamically. (It is possible to change in profile and after profile activation is needed.)

So, if an LPAR needs, it will have % desired outgoing bandwidth. If additional outgoing bandwidth is available, any partition can use it. If a partition doesn’t need its minimum, that bandwidth is available to other partitions until the owning partition needs it. Capacity settings don’t have any influence on the incoming bandwidth.

------------------------------------

SR-IOV and LPM:


Picture shows an adapter with 2 physical ports, and one of those ports, is virtualized into 3 Logical Ports or VFs (yellow squares).

From a virtualization perspective, SR-VIO logical ports are seen as physical adapters at OS level, therefore operations like Live Partition Mobility are not supported when an SR-IOV logical port is configured on the partition (LPAR B).

If a Logical Port is part of a SEA (which bridge traffic from client partitions to the physical network) then client LPAR has only a Virtual Ethernet Adapter (LPAR A), so it can continue using Live Partition Mobility.

------------------------------------

SR-IOV and Link Aggregation

In an LACP configuration multiple primary SR-IOV logical ports are allowed. When LACP (IEEE802.3ad,) configured with multiple main logical ports, only SR-IOV logical ports can be part of the link aggregation and only 1 single logical port can be configured per physical port.

So, with LACP only one logical port per physical port can be used.
(The second configuration, with more than one logical port assigned to a physical port, will not work.)


To prevent users from adding a logical port to the physical port when LACP being used, you can set the logical port capacity to 100%.

------------------------------------

SR-IOV and Etherchannel (NIB):

In an active-passive configuration (Network Interface Backup), SR-IOV logical port can be primary (active) or backup (passive), or both. If more than one primary adapter is configured in an Etherchannel, then SR-IOV logical port cannot be a primary adapter. When an SR-IOV logical port is configured in an active-passive configuration, it must be configured to detect when to fail over from primary to the backup adapter. This can be achieved by configuring an IP address to ping.

------------------------------------

SR-IOV Configuration:

The main steps needed for SR-IOV:
- change adapter to shared mode
- configure physical ports
- configure logical ports

It is also important to meet necessary requirements: MC level, Firmware level, compatible SR-IOV adapter, Man. Sys. capabilities (SR-IOV capable: True)


Change adapter from Dedicated to Shared mode
Man. Sys --> HW Virtualized IO --> choose adapter --> Modify SR_IOV Adapter

This happens at Man. Sys. level, so adapter should not be assigned to any LPARs. If you check again after modification, you will see adapter will be owned by Hypervisor. It is possible to switch back to dedicated mode, just already configured logical ports must be de-configured prior to that.

------------------------------------

Physical port (PP) config: 
Man. Sys. --> HW Virtualized IO --> choose an adapter --> choose a port



Label, Sub-label: These are helpful to identify a specific port.  Later during Logical Port creatition these help to identify better which physical port to use. During LPM HMC recognizes physical port labels (not Sub-labels) for vNIC devices. If set, HMC requires the same label to be present on the target system.

Configured Speed: 1Gbps, 10Gbps...

MTU Size: 1500, 9000. If the physical port is configured to MTU 1500 and a logical port attempts to use jumbo frames, oversized packets will get dropped.

Port Switch Mode: Virtual Ethernet Bridge (VEB) or Virtual Ethernet Port Aggregator (VEPA):
            VEB (This is the default): Bridging between logicalports (VFs) on the same physical port is done by the adapter.
            (Logical port to logical port traffic is not exposed to an external switch, lower latency.)
            VEPA: Bridging between logical ports on the same physical port is done by an external switch. 
            (Logical port to logical port traffic flows out to the external switch and then back. Switch can control traffic if needed.)

Flow Control: On or Off (default is off)
Flow control, controls data transfer to prevent a fast sender from over running a slow receiver. If the receiver has a heavy traffic load or if it has less processing power, then the sender transmits information at a faster rate than the destination can process it and in this case flow control can pause the traffic (or using priority flow control, it can selectively pause traffic on the link, for example only iSCSI or FCoE traffic only.) Modern computers can cache the data before the buffer fills up, so they do not need flow control.

Logical Port Limits: How many Logical Ports can be created on this Physical Port.
            Consider that desired capacity cannot exceed 100% physical port.
            For example if 4 Logical Ports with 25% have been created, then no additional LPs can be created on that physical port.

------------------------------------

Logical Port config:
Logical Port is mapped to an LPAR, so an LPAR has to be chosen before config.
choose an lpar --> HW Virtualized IO --> add port --> choose an adapter with a spec. physical port --> choose a port


Capacity: This is the desired minimum bandwidth of the port`s capacity.
          For 100/40GB adapters any value, for other adapters only even number can be used up to 100%.
          Logical ports can get more, if there is not used free bandwidth, and it is shared equally among all logical ports.
          Capacity settings do not apply to received traffic, only to transmitted traffic.
          Capacity cannot be changed dynamically, (calculate wisely during creation)
          Desired capacity cannot exceed 100% per physical port (this affects how many logical ports we can create)

Promiscuous: if the logical port will be assigned to a VIOS in a SEA, then enabling promiscuous mode is required.
         (Promiscuous mode can be enabled on only one logical port per physical port.)
          When promiscuous mode is enabled, the Allow All VLAN IDs and  Allow all O/S Defined MAC Addresses is the only option available.

MAC Address: The management console will auto assign the default MAC address unless a specific MAC address is set

MAC Address restrictions: (If Promiscuous is selected neither VLANs or MAC addresses can be restricted)
         Allow All – No restrictions on what MAC Addresses can be used.
         Deny All – The OS can only use the default MAC address
         Allow Specified – Set a list of MAC addresses that are allowed to be used by the OS. OS MAC config still needed in the OS.

VLAN restrictions: Restricts VLANs that the logical port device driver can use.
        (If Promiscuous is selected neither VLANs or MAC addresses can be restricted)
         Allow All– No restrictions on which VLANs can be used.
         Deny All– OS can not configure a VLAN ID. The OS will only receive packets that are untagged
         Allow Specified – Set a list of VLAN IDs that the OS is allowed to use for the LP. (VLANs still need to be configured in the OS)

         On Power9 if PVID is configured (other then 0), then VLAN Restrictions must be set to Deny All OS VLAN IDs.
         OS is only allowed to send frames untagged and receive untagged frames.

Port VLAN ID: Set a non zero PVID to have the adapter add a VLAN tag with this VLAN to all untagged transmit packets and strip the VLAN tag from receive packets. Received packets that have a match for this VLAN ID will be received by the OS as untagged packets.

Port VLAN ID (PVID) Priority: A value between 0 - 7 and it only applies if the PVID is set to a non zero value. (This is valid for specific adapters.)

------------------------------------

POWER9 Enhancements: Max Capacity
Max Capacity can be set to a Logical Port (in percentage), which means, the configured setting is the maximum percentage of the physical port's bandwidth. So the Logical Port still gets the desired value, but if free resources are available, it can go up to this setting. Platform firmware will provide a “best effort” to enable a maximum percentage to the logical port. It can be between 1 and 100, and the default is 100. It can be configured in HMC CLI and REST API (not possible in GUI) and cannot be changed dynamically.

To check if adapter supports max capacity: # lshwres -m my_p9 -r sriov --rsubtype adapter
…,custom_max_capacity_capable=1,…

Specify max capacity: # chhwres -m my_p9 -r sriov –rsubtype logport –id 2 -o a -a "adapter_id=1,phys_port_0,logical_port_type=eth,capacity=10,max_capacity=20"

List max capacity of SRIOV logical port: # lshwres -m alpfp094 -r sriov --rsubtype logport --level eth
…,max_capacity=20,…

Specify max capacity for vNIC:
chhwres -m my_p9 -r virtualio –rsubtype vnic –id 2 -o a -a "backing_devices=\"sriov/my-vios/1/1/2/10.00/20/60.0,sriov/my-vios2/2/1/3/8.00/40/80.0\""

Display max capacity of vNIC backing devices: # lshwres -m my_p9 -r virtualio –rsubtype vnic –filter "lpar_ids=2"
…backing_devices=”sriov/my-vios/1/1/2/27008014/10.0/10.0/20/60.0/60.0”…

Max capacity is displayed twice (60.0/60.0), the first one is the current max capacity, the second one is desired max capacity (also called original max capacity) and it's used for migration/remote restart to restore the original max capacity.

------------------------------------

Power9 Enhancements: Enable/Disable Logical Port
Similar to disabling/enabling a virtual Ethernet adapter or a virtual NIC adapter, we can now disable/enable a native SR-IOV Ethernet logical port. It is possible in HMC CLI only (not in GUI)

To disable:
chhwres -m <system name> -r sriov –rsubtype logport [-p <lpar name> | --id <lpar id>] -o d -a "adapter_id=<adapter_id>,logical_port_id=<logical_port_id>"

To enable:
chhwres -m <system name> -r sriov –rsubtype logport [-p <lpar name> | --id <lpar id>] -o e -a "adapter_id=<adapter_id>,logical_port_id=<logical_port_id>"

To display disable mode:
lshwres -m <system name> -r sriov –rsubtype logport –level eth –F logical_port_id, is_disabled

------------------------------------

After finishing it, on OS level a new adapter will be configured with VF in the name (cfgmgr may needed)

# lsdev -Cc adapter
ent0    Available 02-00 4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent1    Available 02-01 4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent2    Available 02-02 4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent3    Available 02-03 4-Port Gigabit Ethernet PCI-Express Adapter (e414571614102004)
ent4    Available 0C-00 PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)

# entstat -d ent4
...
...
VF Minimum Bandwidth: 24%
VF Maximum Bandwidth: 100%

4 comments:

  1. Great article! Thanks for the post!

    ReplyDelete
  2. Hi, thank's for info.

    Is multiple logical ports per one physical port supported for Linux-bonding in active-backup mode ?

    ReplyDelete