PERF. - VMM

VMM concepts
(In some AIX documentations Working vs Persistent memory segments are used, but sometimes persistent storage (like in svmon)  is referred to JFS file cache only. Because of the double meaning of the word "persistent", I prefer to use Working vs Permanent storage (like in numperm), and using the word persistent for JFS file cache.)

Virtual-memory segments are partitioned in units called pages; each page is either located in real physical memory (RAM) or stored on disk until it is needed. AIX uses virtual memory to address more memory than is physically available in the system. The management of memory pages in RAM or on disk is handled by the VMM.

A page is a fixed-size block of data (usually 4096 byte). A page might be resident in memory (that is, mapped into a location in physical memory), or a page might be resident on a disk (that is, paged out of physical memory into paging space or a file system).

The pagesize command shows the page size used by the system:
$ pagesize
4096

The VMM maintains a free list of available page frames. The VMM also uses a page-replacement algorithm to determine which virtual-memory pages currently in RAM will have their page frames reassigned to the free list.

AIX tries to use all of RAM all of the time, except for a small amount which it maintains on the free list. To maintain this small amount of unallocated pages the VMM uses page outs and page steals to free up space and reassign those page frames to the free list.

overhead             -- The load that AIX incurs while sharing resources between user processes and performing its internal accounting.
page                 -- A fixed-size (4KB) block of memory.
page fault           -- It occurs when a process tries to access an address in virt mem. that does not have a location in physical memory.
                        In response, the system tries to load the appropriate data from the hard disk
page stealing daemon -- The daemon responsible for releasing pages of memory for use by other processes
                        (It makes room for incoming pages, by swapping out mem. pages that are not the part of the working set of a process.)
paging in            -- Reading pages from swap.
paging out           -- Releasing pages of physical memory for use.

Kernel continuously checks to see if the number of pages on the free list is below a threshold. If so the page stealing daemon, becomes active and begins copying pages to the swap area, starting with least recently used pages. Each page placed on the free list then becomes available for use by other processes. Pages written out to swap must be read back into physical memory when the process needs them again.

The AIX VMM integrates cached file data with the management of other types of virtual memory (for example, process data, process stack, and so forth). It caches the file data as pages, just like virtual memory for processes. (In most modern computer systems, each thread has a reserved region of memory referred to as its stack.)

------------------

Working Storage

Working storage pages are pages that contain volatile data (in other words, data that is not preserved across a reboot). It can be like Process data, Stack, Shared memory, Kernel data

When modified working storage pages need to be paged out (moved from memory to the disk), they are written to paging space. Working storage pages are never written to a file system.

When a process exits, the system releases all of its private working storage pages. Thus, the system releases the working storage pages for the data of a process and stack when the process exits.


Permanent Storage

Permanent storage pages are pages that contain permanent data (that is, data that is preserved across a reboot). This permanent data is just file data. So, permanent storage pages are basically just pieces of files cached in memory.

When a modified permanent storage page needs to be paged out (moved from memory to disk), it is written to a file system.

You can divide permanent storage pages into two sub-types:

    - Non-client pages (aka persistent pages): these are pages containing cached Journaled File System (JFS) file data
    - Client pages: These are pages containing cached data for all other file systems (for example, JFS2 and Network File System (NFS)

------------------

In order to help optimize which pages are selected for replacement by the page replacement daemons, AIX classifies pages into one of two types:

    - Computational pages: pages used for the text, data, stack, and shared memory of a process
    - Non-computational pages: pages containing file data for files that are being read and written.

All working storage pages are computational. A working storage page is never marked as non-computational.

Depending on how you use the permanent storage pages, the pages can be computational or non-computational. If a file contains executable text for a process, the system treats the file as computational and marks all of the permanent storage pages in the file as computational. If the file does not contain executable text, the system treats the file as non-computational file and marks all of the pages in the file as non-computational. (Basically every file starts as being non-computational. When there is a reference to a memory block which contains instructions (not just data), then that block and all other blocks belonging to that file marked as computational.)

Once a file has been marked as computational, it remains marked as a computational file until the file is deleted (or the system is rebooted). Thus, a file remains marked as computational even after it is moved or renamed.


------------------

Page replacement

The AIX page replacement daemons scan memory a page at a time to find pages to evict in order to free up memory. The page replacement daemons must choose pages carefully to minimize the performance impact of paging on the system, and the page replacement daemons target pages of different classes based on tunable parameter settings and system conditions.

There are a number of tunable parameters that you can use to control how AIX selects pages to replace.

------------------

minperm and maxperm

These tunable parameters are used to indicate how much memory the AIX kernel should use to cache non-computational pages. The maxperm tunable parameter indicates the maximum amount of memory that should be used to cache non-computational pages. The minperm limit indicates the target minimum amount of memory that should be used for non-computational pages.

By default, maxperm is an "un-strict" limit, so it allows more non-computational files to be cached in memory when there is available free memory. The maxperm limit can be made a "strict" limit by setting the strict_maxperm tunable parameter to 1.
(The disadvantage of this is, that the number of non-computational pages cannot grow beyond maxperm and consume more memory when there is free memory on the system.)

numperm (lru_file_repage)

The number of non-computational pages is referred to as numperm: The vmstat -v command displays the numperm value for a system as a percentage of a system’s real memory.

When the number of non-computational pages (numperm) is greater than maxperm, the AIX page replacement daemons strictly target non-computational pages (for example, cached files that are not executables).

When the number of non-computational pages (numperm) is less than minperm, the AIX page replacement daemons target both computational and non-computational pages. In this case, AIX scans both classes of pages and evicts the least recently used pages.

When the number of non-computational pages (numperm) is between minperm and maxperm, the lru_file_repage (least recently used) tunable parameter controls what kind of pages the AIX page replacement daemons should steal. If lru_file_repage set to 0, AIX always targets non-computational pages when numperm is between minperm and maxperm.

In most customer environments, it is most optimal to just have the kernel always target non-computational pages, because paging computational pages (for example, a process’s stack, data, and so forth) usually has a much higher performance cost on a process than paging non-computational pages (that is, data file cache). Thus, the lru_file_repage tunable parameter can be set to 0. In this case, the AIX kernel always targets non-computational pages when numperm is between minperm and maxperm.

------
maxclient

maxclient specifies a limit on the maximum amount of memory that should be used to cache non-computational client pages. Because all non-computational client pages are a subset of the total number of non-computational permanent storage pages, the maxclient limit must always be less than or equal to the maxperm limit.

numclient

The number of non-computational client pages is referred to as numclient. The vmstat -v command displays the numclient value for a system as a percentage of a system’s real memory.

By default, the maxclient limit is a strict limit. This means that the AIX kernel does not allow the non-computational client file cache to exceed the maxclient limit (that is, the AIX kernel does not allow numclient to exceed maxclient). When numclient reaches the maxclient limit, the AIX page replacement daemons strictly target client pages.

------

minfree, maxfree

Two other important parameters are minfree and maxfree. If the number of pages on your free list (vmstat -v: free pages) falls below the minfree parameter, VMM starts to steal pages (just to add to the free list), which is not good. It continues to do this until the free list has at least the number of pages in the maxfree parameter.

------

# vmstat -v 
       4980736 memory pages
        739175 lruable pages
        432957 free pages                     <--6 digit generous, 5 digit ideal, 4 digits trouble, 3 digits big trouble
             1 memory pools
         84650 pinned pages
          80.0 maxpin percentage
          20.0 minperm percentage
          80.0 maxperm percentage
           2.2 numperm percentage             <--% of memory containing non-comp. pages (jfs, jfs2, nfs)
         16529 file pages                     <--# of non-comp. pages
           0.0 compressed percentage
             0 compressed pages
           2.2 numclient percentage           <--% of memory containing non-comp. client pages (jfs2, nfs)
          80.0 maxclient percentage
         16503 client pages                   <--# of non-comp client pages


So, in the above example, there are 16529 non-computational file pages mapped into memory. These non-computational pages consume 2.2 percent of memory. Of these 16529 non-computational file pages, 16503 of them are client pages.

The vmstat output does not provide information about computational file pages. Information about computational file pages can be gathered from the svmon command

# svmon -G                <--in memory pages of each type (work, pers., client)
               size      inuse       free        pin    virtual
memory       786432     209710     576722     133537     188426
pg space     131072       1121

               work       pers       clnt
pin          133537          0          0
in use       188426          0      21284

    - work: working storage
    - pers: persistent storage (persistent storage pages are non-client pages - that is, JFS pages.)
    - clnt: client storage (jfs2, nfs)

For each page type, svmon displays two rows:

    - in use: number of 4K pages mapped into memory
    - pin: number of 4K pages mapped into memory and pinned (pin is a subset of inuse)

So, in the above example, there are 188426 working storage pages mapped into memory. Of those 188426 working storage pages, 133537 of them are pinned (that is, can’t be paged out).

There are no persistent storage pages (because there are no JFS filesystems in use on the system). There are 21284 client storage pages, and none of them are pinned.

The svmon command does not display the number of permanent storage pages, but it can be calculated from the svmon output. As mentioned earlier, the number of permanent storage pages is the sum of the number of persistent storage pages and the number of client storage pages. So, in the above example, there are a total of 21284 permanent storage pages on the system:

0 persistent storage pages + 21284 client storage pages = 21284 permanent storage pages

The type of information reported by svmon is slightly different than vmstat. svmon  reports information about the number of in-memory pages of different types: working, persistent (that is, non-client), and client. svmon does not report information about computational versus non-computational. svmon just reports the total number of in-memory pages of each page type.

In contrast, vmstat reports information about non-computational versus computational pages.

To illustrate this difference, consider the above example of svmon output. Some of the 21284 client pages will be computational, and the rest of the 21284 client pages will be non-computational. To determine the breakdown of these client pages between computational and non-computational, use the vmstat command to determine how many of the 21284 client pages are non-computational.


-----------
suggested:

lru_file_repage = 0
maxperm = 90%
maxclient = 90%
minperm = 3%
strict_maxclient = 1 (default)
strict_maxperm = 0 (default)

# vmo -p -o lru_file_repage=0 -o maxclient%=90 -o maxperm%=90 -o minperm%=3
# vmo -p -o strict_maxclient=1 -o strict_maxperm=0

The above tunable parameters settings are the default settings for AIX Version 6.1.

With these settings computational memory pages will be NOT paged to disk until you do not exceed 97% computational memory (minperm=3%). If computational memory exceed 97% than the system does not care anymore what is in memory, the last recently used pages (can be comp. or non-comp.) will be paged out.

-----------------------------

An example:

topas:

 MEMORY
 Real,MB   26623
 % Comp     57          <--this is used for processes (OS+appl.), if you add nmon Process+System, for me it was the same (46+11)
 % Noncomp  22          <--fs cache
 % Client   22          <--fs cache (for jfs2)


nmon:

 FileSystemCache
 (numperm) 22.5%        <--this is for fs cache
 Process   46.0%        <--this is for appl. processes
 System    11.3%        <--this is for the OS
 Free      20.2%        <--free
           -----
 Total    100.0%

-----------------------------

18 comments:

  1. Hello,
    While installing oracle on the AIX 5.3 I get error like physical memory can not access the node(loopback node).

    ReplyDelete
  2. Hello,
    Can you please give some more details? (What was the exact error message what you received?)

    ReplyDelete
  3. Hello,

    I am facing one issue in my program.I have allocated some amount of buffer while starting the program and pinned the memory segment. my intention is for the successive memory allocations in my program, it should suppose allocate from the buffer pool which i have pinned. But it's allocating outside of my pinned memory. any idea how to make my program to allocate memory from the buffer pool for the sucessive memory allocations..

    ReplyDelete
  4. Hello,

    I gathered these infos:
    User applications may pin memory through several different mechanisms. Applications can use the plock(), mlock(), and mlockall() subroutines to pin application memory.

    An application can explicitly pin shared memory regions by specifying the SHM_LOCK option to the shmctl() subroutine. An application can also pin a shared memory region by specifying the SHM_PIN flag to shmget().

    In the manual of "vmo" I have found this:
    v_pinshm: If set to 1, will allow pinning of shared memory segments.

    A value of 0 indicates off. ..Useful only if the application also sets the SHM_PIN flag when doing a shmget() call and if doing async I/O from shared memory segments.

    There are other tunables in vmo, but I think these are related to your question. I hope this helps.

    ReplyDelete
  5. Hi we are facing an issue on AIX6.1
    The box is used as a DB2 server. When we process data, everything runs fine. The FileSystemCache shows around 18% usage for Process and System around 2%. The Physical memory usage shows around 18% usage. But when we try to fetch some data, the FileSystemCache shoots up to 97% and physical memory usage goes up to 100% and the system runs out of memory. The minper% is 3 and maxperm% is 90. maxclient is 90. Any idea why this might be the case. (The only way to recover from this state is a hard reboot as no process can be spawned even to kill db2)

    ReplyDelete
  6. Hi, it sounds to me that when this happens you are getting paging space activity and paging space gets full after a while. (I guess that's why no new process can be spawned.) To avoid the reboot, the size of paging space should be increased, but this does not handle the source of the problem.

    So, obviously memory looks a bottleneck, but some more checks would be also good when that happens: I would check page in/out and file reads/write with vmstat (vmstat -Iwt 2, and check pi/po, fi/fo, and also check the size of avm, Active Virtual Memory.) Over memory commitment would be a situation where AVM would be greater that the installed RAM. So if AVM is greater than the installed RAM than you need more RAM or should reduce the workload somehow.

    ReplyDelete
  7. Most DAtabase program will grab as much memory as possible in order to run quickly, speak to your DBA's and find out if they can reduce the memory used by DB2

    ReplyDelete
  8. Hi ,

    In which situation maxperm and minperm settings get into action or for which reason these settings to be fine tuned.

    Regards,
    Siva

    ReplyDelete
    Replies
    1. Hi,
      Maxperm indicates maximum amount of memory that should be used to cache non-computational pages.
      Minperm indicates minimum amount of memory that should be used for non-computational pages.
      (Non-computational pages: pages containing file data for files that are being read and written (JFS/JFS2/NFS))

      VMM is checking frequently the value of numperm (number of non-computational pages as a percentage of a system’s real memory.

      When the number of non-computational pages (numperm) is greater than or equal to maxperm, AIX page replacement daemons strictly target non-computational pages (for example, cached files that are not executables).

      When the number of non-computational pages (numperm) is less than or equal to minperm, AIX page replacement daemons target both computational and non-computational pages. In this case, AIX scans both classes of pages and evicts the least recently used pages.

      Delete
  9. hi ,

    Your blog is excellent ..explanation is great ..

    In simple words can we say like ... As for me, non comp is that memory, that are not changing
    like shared library. It is loaded to RAM only to increase executing speed of application, for faster access
    but nothing changed in that library code.
    and Comp memory is that memory that are changing and cause to dirty pages, caching etc ...

    Is it right ?

    ReplyDelete
    Replies
    1. Hi, I think your approach is correct, some additional info if this helps:
      "If a file contains executable text for a process, the system treats the file as computational and marks all of the permanent storage pages in the file as computational. If the file does not contain executable text, the system treats the file as non-computational file and marks all of the pages in the file as non-computational."

      Delete
  10. very nice blog...
    I believe the info for this para is vice versa..
    "If a file contains executable text for a process, the system treats the file as computational and marks all of the permanent storage pages in the file as computational. If the file does not contain executable text, the system treats the file as non-computational file and marks all of the pages in the file as non-computational."
    I may be wrong but till now I have heard... executables are part of non-computational and non executable are part of computational.
    Please correct me if I am wrong..
    Thanks :)

    ReplyDelete
    Replies
    1. hi, you can check here as well: http://www.ibm.com/developerworks/aix/library/au-vmm/

      Delete
  11. hi,i am a newer for AIX/Unix,are there any books for Unix Kernel or AIX Kernel,i want to learn Unix Kernel.But i didn't find any book.Thanks,

    ReplyDelete
  12. We have two node RAC but the first instance "t24db1" consumes all of the assigned memory, while other node remains on 60% utilization.

    ReplyDelete
  13. Hi Balazs,

    Commendable and much appreciated , very tiny and yet very useful . I must say each article of your blog is so clear to understand ..

    Could you please look the mentioned paragraph if that is correct ...

    "Some of the 21284 client pages will be computational, and the rest of the 21284 client pages will be non-computational."



    Thanks a lot for your time .

    Regards
    Manoj Suyal

    ReplyDelete
  14. Hello, Balazs,

    One thing is confusing to me:

    Please see:

    1:root@dhltaixdb02:/var/spool/cron/crontabs # vmstat -v
    18874368 memory pages
    9059495 lruable pages
    9664261 free pages
    6 memory pools
    1114644 pinned pages
    95.0 maxpin percentage
    3.0 minperm percentage
    90.0 maxperm percentage
    21.6 numperm percentage
    1960382 file pages
    0.0 compressed percentage
    0 compressed pages
    21.6 numclient percentage
    90.0 maxclient percentage
    1960382 client pages
    0 remote pageouts scheduled
    4 pending disk I/Os blocked with no pbuf
    0 paging space I/Os blocked with no psbuf
    2228 filesystem I/Os blocked with no fsbuf
    131265 client filesystem I/Os blocked with no fsbuf
    5100 external pager filesystem I/Os blocked with no fsbuf
    38.4 percentage of memory used for computational pages

    Here the numperm is reported as 21.6%,
    while topas reports %Noncomp=10, and nmon reports numperm=10.4%.

    Since the numperm percentage of vmstat is only non-computational pages, where does it get this additional 11% from?
    I'm running AIX 6100-09-03-1415.

    ReplyDelete
  15. Hi, My AIX server now using 40% MEM for Filesystem cache. If I clear that by change maxperm%, my services will be affected?

    ReplyDelete