dropdown menu

STORAGE - SAN

SAN (Storage Area Network)

When storage is not attached directly to a server (it is located remotely), the data can be accessed in 2 forms.
1. File access: All technologies that allow access to files over a network can be considered as NAS (Network Attached Storage). Filesystems are defined on external devices (on a NAS server) and these are shared using protocols like NFS (on Unix) or CIFS (on Windows).

2. Block access: It uses SCSI, and disk space is represented as a LUN. Several technologies are using this methond:
-SAN (Storage Area Network): The SCSI protocol is transferred by Fibre Channel.
-iSCSI (Internet SCSI): The SCSI protocol is transferrer over the TCP/IP network.
-FCoE (FC over Ethernet): SCSI is transmitted by FC, which are encapsulated in Ethernet frames. (Ethernet is lower layer than TCP/IP)
-FCIP (Fibre Channel over IP): The SCSI protocol is transmitted in Fibre Channel frames over a TCP/IP network.

------------------------------------------

SAN

SAN is a network of storage devices and switches, which are connected to the servers. Each server can access the storage as if it were directly attached to that server. When a server wants to access a storage device on the SAN, it sends out a request, which consists of SCSI commands encapsulated into FC packets. This request is accepted by the HBA and is converted from binary form to optical form according to the rules of FC protocol.  The HBA transmits the request to the SAN. Depending on the cabling, one of the SAN switches receives this request and sends it to the storage processor, which sends it to the storage device.

SAN components at Server:
HBA: It is located in the server, and it performs digital‐to‐optical signal conversion. Each server connects to the SAN through its HBAs.
HBA driver: It is running on the servers, so the operating system can communicate with the HBA.

SAN components at Storage:
Storage Processor (SP): Storage arrays include SPs. SPs communicate with the disk array and provide the RAID/LUN functionality.
(A storage controller (also known as "storage processor" or "array controller") is a device that controls storage arrays, e.g. IBM SVC.)
Disk array: A group of multiple disk devices, using RAID technology (RAID levels). Data is stored on disk arrays or on tape devices
LUN (Logical Unit): It is a single unit of storage, and it looks like from server side as a usual disk .

SAN components at Fabric:
Fabric is the hardware (switches, cables, routers) that connects servers to storage devices. SAN is usually divided into more fabrics (like Fabric A and Fabric B) to provide redundancy in case of component failures. SAN fabric consists of:
SAN Switch: connects servers to storage arrays, it makes path redundancy possible in the event of a path failure
Router: Servers can access SCSI disk or tape devices in the SAN through the data routers in the fabric layer. (It is a bridge device.)
Cables:  SAN cables are usually fiber optic cables that are used to connect all of the fabric components.
Communications Protocol: Fabric components communicate using the FC protocol.


------------------------------------------

Port:

A port is any entity that actively communicates over the network, like a storage port, an HBA port, or a switch port.

N_port is a port on the node (server or storage), also known as Node port.
F_port is a port on the switch that connects to a node (connects to an N_port), also known as Fabric port.
E_port is the connection between two switches, also known as an Expansion port.
EX_port is the connection between a router and a switch. (At switch side it is an E_port, but at router side it is an EX_port.)



FC topologies for connecting ports:
-Point-to-Point (FC-P2P): Two devices are connected directly to each other.
-Arbitrated loop (FC-AL): All devices are in a loop or ring, similar to token ring network. (A port failure causes a break in the ring.)
-Switched fabric (FC-SW): All devices are connected to switches.(Failure of a port does not affect other ports.)

------------------------------------------

WWN (World Wide Name) and WWPN  (World Wide Port Name)

Each device in the SAN is identified by a unique world wide name (WWN). Each N_Port has its own WWN (called the WWPN). WWPN is a unique identifier for a port on the HBA. The command lscfg (or fcstat) can show the WWPN of the HBA:

Checking WWPN:
# lscfg -vpl fcs* | grep Net
        Network Address.............100000109B21ABEC
        Network Address.............100000109B21ABED

A WWPN identifies the port of the adapter. In a dual port HBA, there will be 2 WWPNs (one for each port) and there will be one WWN for the card itself. (In below example the "1" is simply changed for "2")

# fcstat fcs0
FIBRE CHANNEL STATISTICS REPORT: fcs0
Device Type: 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)
...
World Wide Node Name: 0x20000000C9A8C4A6
World Wide Port Name: 0x10000000C9A8C4A6

WWPNs are similar to MAC addresses in Ethernet network, and these are assigned by the manufatcturer, and these are guaranteed to be globally unique.

------------------------------------------

FC_ID (Port ID or port address)

WWPNs are only used in SAN until these are discoverd by the switches. There is a second type of address used in Fibre Channel which is a dynamically assigned address. It is called the Fibre Channel ID (FC_ID). FC_ID is assigned by the switch when the device logs into the fabric, and FC_ID is mapped to the WWPN. During data transfer in the SAN this FC_ID is used to address the correct device. You can think of WWN's as MAC addresses and FC_ID as the layer3 IP addresses. This port ID is valid only while the device is logged on.

# fcstat fcs0
FIBRE CHANNEL STATISTICS REPORT: fcs0
...
Port FC ID: 0x212500


or # lsattr -El fscsi0
scsi_id      0x212500

21: Domain ID (switch id)
25: port id

------------------------------------------

Zoning

Zoning provides access control in the SAN. It defines which HBAs can connect to which SPs. When in a SAN switch zoning is configured, the devices outside a zone are not visible to the devices inside the zone. In addition, SAN traffic within each zone is isolated from the other zones. It is similar to VLANs in Ethernet networks. (You can have multiple ports to the same SP in different zones to reduce the number of presented paths. )

------------------------------------------

LUN Mapping and Masking

Mapping means making devices visible on a storage port. For example, if the LUN address is defined on the director as F0 in hex (240 decimal), the host will discover the device as LUN 240. Masking means not allowing all attached (= zoned) hosts to see everything on that storage port. Only the masked wwns can see the LUNs. (like a firewall on a network)

As an example, first we can map 10 logical devices to the storage port,then we perform masking so that host1 can see 3 of the logical devices, and host2 can see another 7 of the logical devices.

With this configuration multiple access paths can be created to a given LUN. Below example shows 4 paths to hdisk0, 2 paths per HBA. This means that a single port on the server side communicates with 2 ports on the storage side.

# lspath | grep hdisk0
Enabled hdisk0 fscsi0
Enabled hdisk0 fscsi0
Enabled hdisk0 fscsi1
Enabled hdisk0 fscsi1

------------------------------------------


11 comments:

Unknown said...

Hi Balazs.

I have a problem and I need ideas to resolve. That is the issue:

I have this configurations:

a) 1 power 7 rack server with 1 vios and 1 lpar connected to storage DS5020.
b) 1 power 7 blade server with 1 vios and 1 lpar connected to storage DS5020.
c) 1 power 5 rack server with 1 vios and 1 lpar connected to storage DS5020.

This 3 servers (power7 rack, power 7 blade, power5 rack) are connected to the same storage IBM DS5020.


Reviewing logs at October 12:

The storage DS5020 has this message: Logical drive not on prefered path. 14 logical drives was moved to the second path.

I have errors in 3 VIOS's servers.

In October 12 at 1:59 am, I lost connection to the virtual disks and the lpar hangs located in the power 7 rack server.

In October 12 at 1:59 am I lost connection to the virtual disks and the lpar hangs located in the power 5 rack server.

In October 12 at 1:59 am I lost connection to the virtual disks and the lpar supports the change in the path located in the power 7 BLADE server. Only this lpar survive.


Reviewing logs at October 15:

The storage DS5020 has this message: Logical drive not on prefered path. 10 logical drives was moved to the second path.

I have errors in 3 VIOS's servers.

In October 15 at 1:59 am, lpar lost connection to the virtual disks and the lpar hangs located in the power 7 rack server.

In October 15 at 1:59 am, lpar lost connection to the virtual disks and the lpar hangs located in the power 5 rack server.

In October 15 at 1:59 am, lpar lost connection to the virtual disks and the lpar supports the change in the path located in the power 7 BLADE server. Only this lpar survive and continues working.

I have the errpt logs if you want to see them.

Regards.
Paul.

aix said...

Hi Paul,

it is the first time I see this kind of problem, so I can just guess that it is related to the SAN network or storage system. In Google I have found this page: http://www-01.ibm.com/support/docview.wss?uid=swg21412057
To have a proper analysis, I would suggest to open an IBM call (unless someone has a better idea).

Regards,
Balazs

Unknown said...

Hi Balazs

Tks for your answer.

I opened a IBM call 1 week ago, but still I don't have response. They only ask for the CASD.

The only difference that I found is the Vios ioslevel. The ioslevel in the VIOS that continues working is 2.2.0.12-FP24 SP-02. The other 2 Vios that hangs are 2.1.3.10-FP23.

Do you think that if I upgrade the vios version, I can workaround or resolve this problem?

Regards
Paul.



aix said...

Hi Paul,

90% of the IBM calls, IBM suggests update to the latest level. In the past it happened to me, that strange errors disappeared after an OS update.
I think, doing an upgrade could be a good approach in this case (and if we are lucky problem will be solved by upgrade.)

Regards,
Balazs

Anonymous said...

Nice .. explanation

Anonymous said...

thank for sharing,
any recommended redbooks for basics of san and aix ?

Anonymous said...

no

Unknown said...

My ram is fail how will u replace it.........

Anonymous said...

FYI, in my environment (NPIV configuration to multiple Netapp SANs, Cisco FCP switches), I find that the third byte of the scsi_id does not correlate to the vio server as identified in the Zoning section. The first byte does correlate to the switch. I did not attempt to confirm the second byte matching to a port number.

aix said...

Thanks for the info. (I removed that part from this page, until it will be verified.)

Unknown said...

Thanks in advance
My san controllers went restart. when checked my Oracle rac nodes(had 4 nodes) the pvids were corrupted. i was unable to run ASMCMD. suggest how to lock the pvids for my luns(aix7.1)