HW - MEMORY

Memory hierarchy

The instructions and data that the CPU processes are taken from memory. Memory comes in several layers.
Registers - the top layer, it is high speed storage cells (can contain 32-64 bit data)

Caches - If data can not be found in registers it will be looked in  the next level, which is cache
         L1 cache the fastes an smallest (usually on CPU chip) 32-256 KB
         L2 cache, if the needed data not in L1, CPU is trying to find it in L2, it can be megabytes in size (4MB)
         L3 cache, it is a bit more far away, around 32MB

RAM - If the needed data not in the hardware caches then TLB (Translation Lookaside Buffer) will be checked, after the RAM
         TLB - cache of recently accessed addresses

Disk - If the address is not in RAM, then a page fault occurs and the data is retrieved from the hard disk.
         A page fault is a request to load a 4KB data page from disk.

The way demand paging works is that the kernel only loads a few pages at a time into real memory. When the CPU is ready for another page, it looks at the RAM. If it cannot find it there, a page fault occurs, and this signals the kernel to bring more pages into RAM from disk.

If the CPU is waiting data from real memory, the CPU is still considered as being in busy state. If data is needed from disk then CPU is in I/O wait state.

-------------------------------------

L2 Cache and performance:

L2 cache is a fast memory which stores copies of the data from the most frequently used  main memory locations.


First picture shows a Power5 system, second picture Power6 (or Power7) system.

In Power5 systems, there is a single L2 cache on a processor chip which is shared by both the cores on the chip. In later servers (Power6 and Power7) they have separate L2 cache for each core on a chip. The partition’s performance depends on how efficiently it is using the processor cache.

If L2 cache interference with other partitions is minimal, performance is much better for the partition. Sharing L2 cache with other partitions means there is a chance of processor’s most frequently accessed data will be flushed out of the cache and accessing from L3 cache or from main memory will take more time.

 -------------------------------------

Paging Space (also called Swap space)

The RAM and the paging space are divided into 4 KB sections called page frames. (A page is a unit of virtual memory that holds 4 KB of data.) When the system needs more RAM, page frames of information are moved out of RAM and onto the hard disk. This is called paging out. When those page frames of information are needed again, they are taken from the hard disk and moved back into the RAM. This is called paging in.

When the system spends more time shuffling page frames in and out of RAM instead of doing useful work, the system is thrashing.

When the amount of available paging space falls below a threshold, called the paging space warning level, the system sends all the processes (except the kernel processes) a SIGDANGER signal. This signal tells the processes to terminate gracefully. When the amount of empty paging space falls further below a second threshold, called the paging space kill level, the system sends a SIGKILL signal to processes that are using the most paging space. (terminate nongracefully)

When AIX is installed, it automatically creates paging space on the installation disk, which is usually the hard disk hdisk0. The name of this paging space is always hd6. The file /etc/swapspaces contains a list of the paging space areas that will be activated at system startup.

swapon is a term from the days before page frames were used. At that time, around 1982, AIX swapped entire programs out of RAM and onto the hard disk. Today, a portion of the program is left in RAM, and the rest is paged out of the program onto the hard disk. The term swapon stuck, so today, we sometimes refer to paging out and paging in as swapping

Once you page out a computation page, it continues to take up space on the paging file as long as the process exist, even if the page is subsequently paged back in. In general you should avoid paging at all.

How much paging space do you need on your system? What is the rule of thumb?
Database administrators usually like to request the highest number of everything and might instruct you to double the amount of paging space as your RAM (the old rule of thumb). Generally speaking, if my system has greater than 4GB of RAM, I usually like to create a one-to-one ratio of paging space versus RAM. Monitor your system frequently after going live. If you see that you are never really approaching 50 percent of paging space utilization, don't add the space.

The number and types of applications will dictate the amount of paging space needed. Many sizing “rules of thumb” have been published, but the only way to correctly size your machine's paging space is to monitor the amount of paging activity.

Tips for paging space:
- Only 1 paging space per disk
- Use disks with the least activity
- Paging spcaces should the same size
- Do not extend a paging spcae to multiple PV's

Ideally, there should be several paging spaces of equal size each on different physical volumes. The paging space is allocated in a round robin manner and will use all paging areas equally. If you have two paging areas on one disk, then you are no longer spreading the activity across several disks.Because of the round robin technique that is used, if they are not the same size, then the paging space usage will not be balanced.


bootinfo -r                    displays the real memory in kilobytes (this also works: lsattr -El sys0 -a realmem)
lscfg -vp |grep -p DIMM        displays the memory DIMM


lsattr -El sys0 -a realmem     (list attributes) see how much real memory you have
ps aux | sort +4 -n            lists how much mem is used by the processes
svmon -P | grep -p <pid>       you can see how much paging spce a process is using
svmon -P -O sortseg=pgsp       shows paging space usage of processes

mkps -s 4 -n -a rootvg hdisk0  creates a paging space (give the name automatically:paging00)
            -n                 activates it immediately,
            -a                 it will be activated at next restart as well (adds it to /etc/swapspaces)
            -s                 size 4 lp
lsps -a                        list all paging spaces and the usage of a paging space
lsps -s                        summary of all paging spaces combined (all the paging spaces are added together)
chps -s 3 hd6                  dynamically increase the size of a paging space with 3 lps
chps -d 1 paging00             dynamically decrease the size of a paging space with 1 lp (it will create a temporary paging space)

/etc/swapspaces                contains a list of the paging space areas
vmstat -s

smitty mkps                    adding paging space
smitty chps                    changing paging space
swapon /dev/paging02           dynamically activate, or bring online, a paging space (or smitty pgsp)
swapoff /dev/paging03          deactivate a paging space

------------------------------

removing a paging space:
    swapoff /dev/paging03      deactivate a paging space (the /dev is needed)
    rmps paging03              removes a paging space (the /dev is not needed)

------------------------------

For flushing the paging space:
(it shows high percentage, but actually nothing is using it)
1. chps -d 1 hd6               it will decrease the size of the paging spave by 1 lp (it will create a temp. paging space, copy the conntent...)
                               (if not enough space in the vg, it will not do that)
2. chps -s 1 hd6               increase paging space to its original size

------------------------------

Fork:
When there is a message regarding cannot fork... it is probably caused by low paging space

When a process calls fork(), the operating system creates a child process of the calling process.
The child process created by fork() is a sort of replica of the calling process. Some server processes, or daemons, call fork() a few times to create more than one instance of themselves. An example of this is a web server that pre-forks so it can handle a certain number of incoming connections without having to fork() the moment they arrive.

When AIX is out of memory it starts to kill processes. Protecting a process (for example ssh) may be important to reach server.
The vmo option 'nokilluid' can be used to protect specific processes:
1. grep ssh /etc/passwd        getting the user id of ssh (in our case it was 202)
2. vmo -o nokilluid=202        user ids lower than this value will no be killed due to low page-space

------------------------------

Virtual Memory Management:
(this section also applicable to ioo, no, nfso)

vmo -a|egrep "minperm%|maxperm%|maxclient%|lru_file_repage|strict_maxclient|strict_maxperm|minfree|maxfree"

root@aix04: / # vmo -a |grep maxclient   
            maxclient% = 8
      strict_maxclient = 1


root@aix1: /root # vmo -L
NAME                      CUR    DEF    BOOT   MIN    MAX    UNIT           TYPE
     DEPENDENCIES
--------------------------------------------------------------------------------
cpu_scale_memp            8      8      8      1      64                       B
--------------------------------------------------------------------------------
data_stagger_interval     n/a    161    161    0      4K-1   4KB pages         D
     lgpg_regions
--------------------------------------------------------------------------------

D = Dynamic: can be freely changed
B = Bosboot: can only be changed using bosboot and reboot
S = Static: the parameter can never be changed
R = Reboot: the parameter can only be changed during boot


/etc/tunables/nextboot        <--values to be applied to the next reboot. This file is automatically applied at boot time.
/etc/tunables/lastboot        <--automatically generated at boot time. It contains the parameters, with their values after the last boot.
/etc/tunables/lasboot.log     <--contains the logging of the creation of the lastboot file, that is, any parameter change made is logged


vmo -o maxperm%=80            <--sets to 80

vmo -p -o maxperm%=80 -o maxclient%=80      <-- sets maxperm% and maxclient% to 80
                             (-p: sets both current and reboot values (updates current value and /etc/tunables/nextboot)

vmo -r -o lgpg_size=0 -o lgpg_regions=0     <--sets only in nextboot file,so after reboot will be activated

------------------------------

SAP Note 973227 recommendations:
    minperm% = 3
    maxperm% = 90
    maxclient% = 90
    lru_file_repage = 0
    strict_maxclient =1
    strict_maxperm = 0
    minfree = 960
    maxfree = 1088

--------------------------------------
--------------------------------------
--------------------------------------

5 comments:

  1. Hi , my system is using 99.2% of memory what needs to be done how do i figure whats the problem which is causing memory to use 99%.

    if it is file system cache how do i check it and resolve it, please help.

    ReplyDelete
  2. Hi,

    In my server the paging space utilisation is 90%, my question is how can I check which particular processes are consuming more Paging space on the server?

    Thanks,
    Sathish.

    ReplyDelete
    Replies
    1. Hi, here you can find some info: http://aix4admins.blogspot.hu/2011/09/memory-leak-caused-by-program-that.html

      Delete
  3. If you want to remove hd6 you can but i never recommended but for doing that, make secondary page space on the system and activated it by swapon command then try to swapoff /dev/hd6 , as you see all data from hd6 move to /dev/secondary pages and maybe this process takes a time (Its depend to capacity of your data on page space) ,then check status of hd6 if its off use rmps command for remove hd6.if iyous not off use swapoff command for off it . Now you need to change boot script which create hd6 in boot process and replace secondary pagespace with hd6 otherwise if you reboot the system hd6 will make it again .regard

    ReplyDelete