EXTRA - KDB

KDB:

The KDB kernel debugger and the kdb command are useful for debugging device drivers, kernel extensions, and the kernel itself. Although they appear similar, the KDB kernel debugger and the kdb command are two separate tools:

KDB KERNEL DEBUGGER
It is integrated into the kernel and allows full control of the system while a debugging session is in progress.


KDB COMMAND:
It is implemented as an ordinary user-space program and can be used for analyzing the following:

1. A running system: When used to analyze a running system, the kdb command opens the /dev/pmem special file, which allows direct access to the system's physical memory. The kdb command performs its own address translation internally using the same algorithms as the KDB kernel debugger.

2. A system dump file produced by a previously crashed-system: A system dump contains certain critical data structures. Only the memory belonging to the process that was running on the processor that created the dump image can be included in the dump file. When you work with a system dump, any subcommands that modify memory are not valid because the system dump is merely a snapshot of the real memory in a system.

When you are analyzing a system dump file, the kdb command must be started with arguments that specify the location of the dump file and the kernel file:
# kdb /var/adm/ras/vmcore.0 /unix
(The kernel file is used by the kdb command to resolve symbol names from the dump file.)

------------------------------------

A very valuable benefit of kdb, that a device setting stored in ODM (lsattr..) can be compared with the realtime value used in running kernel with kdb!!

------------------------------------

KDB COMMAND:

help display context                             lists subcommands with the context "display"
p -?                                             list of parameters for the p subcommand and a brief description
! <command>                                      shell escape (provides a convenient way to run UNIX commands without leaving kdb)
hi                                               print history
lke                                              list loaded extensions
pvol -M <major> -m <minor>                       display physical volume info
stat                                             system status info
status                                           processor status
                                               exit from kdb

------------------------------------

echo vfcs fcs0 | kdb | grep num_cmd_elems        shows num_cmd_elems in hex on VIO client with NPIV (compare with odm: lsattr -El fcs0)
                                                 (if you change num_cmd_elems with chdev, you can check in kdb if it really has been changed)
echo scsidisk hdisk0 | kdb | grep queue_depth    shows real-time value in hex of queue_depth of given disk

------------------------------------

Checking how many virtual processors are active:

root@bb_lpar:/ # echo vpm | kdb                                              <--vpm is a kdb subcommand
...
   0     0  ACTIVE      0 AWAKE        0000000000000000  00000000  00
   1     0  ACTIVE      0 AWAKE        0000000000000000  00000000  00
   2     0  ACTIVE      0 AWAKE        0000000000000000  00000000  00
   3     0  ACTIVE      0 AWAKE        0000000000000000  00000000  00
   4     0  DISABLED    0 AWAKE        0000000000000000  00000000  00        <--earlier smt=4 checked, 1 Virtual Proc. is DISABLED (folding)
   5    11  DISABLED    0 SLEEPING     00000000515B4478  29DBE3CA  02
   6    11  DISABLED    0 SLEEPING     00000000515B4477  2C029174  02
   7    11  DISABLED    0 SLEEPING     00000000515B4477  2C0292A1  02

------------------------------------

Check VSCSI adapter mapping:
(run this on vio client, not on vio server)

root@bb_lpar: / # echo "cvai" | kdb | grep vscsi                              <--cvai is a kdb subcommand
read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01A83C0
vscsi0     0x000007 0x0000000000 0x0                aix-vios1->vhost2         <--shows which vhost is used on which vio server for this client
vscsi1     0x000007 0x0000000000 0x0                aix-vios1->vhost1
vscsi2     0x000007 0x0000000000 0x0                aix-vios2->vhost2

------------------------------------

Check NPIV adapter mapping:
(run this on vio client, not on vio server)

root@bb_lpar: / # echo "vfcs" | kdb                                            <--vfcs is a kdb subcommand
...
NAME      ADDRESS             STATE   HOST      HOST_ADAP  OPENED NUM_ACTIVE
fcs0      0xF1000A000033A000  0x0008  aix-vios1 vfchost8  0x01    0x0000       <--shows which vfchost is used on vio server for this client
fcs1      0xF1000A0000338000  0x0008  aix-vios2 vfchost6  0x01    0x0000

------------------------------------

Check physical FC adapter setting (not in virtual environment):
(dyntrk, fc_err_recov, num_cmd_elems)

These are the settings what we would like to verify:
----------
root@bb_lpar: / # lsattr -El fscsi0| egrep 'dyntrk|fc_err_recov'
dyntrk       yes       Dynamic Tracking of FC Devices        True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True

root@bb_lpar: / # lsattr -El fcs0| grep num_cmd_elems
num_cmd_elems 200        Maximum number of COMMANDS to queue to the adapter True
----------

Verifying the settings from kernel:
1. root@bb_lpar: / # echo efscsi fscsi0 | kdb | grep efscsi_ddi
    struct efscsi_ddi ddi = 0xF1000A06007FA080                            <--this hexa value will be used


2. root@bb_lpar: / # echo dd 0xF1000A06007FA080+20 2 | kdb                <--"+20 2" should be added to the above hexa value
...                                                                       (20 is a reserved number)
F1000A06007FA0A0: 0101020202010200 000000B400000028  ...............(     <--on the specified locations you can decode the numbers there
                          FFDD     NNNNNNNN

FF = fc_error_recov:(we have "02" in this example here, which is fast_fail)
01 = delayed_fail
02 = fast_fail

DD = dyntrk
: (we have "01" in this example here, which means "yes")
00 = disabled (no)
01 = enabled (yes)

NNNN = num_cmd_elems: (we have "B4" in this example here, but some calculation is still needed)
1. change to decimal value: 000000B4 --> 180
2. add 20 to the decimal number: 180 + 20 = 200
(you must always add "20" to the decimal value you get)

------------------------------------ 

Volume group and lv info:


The volgrp subcommand displays information about vg and its lvs.
The volgrp structure addresses are registered in the devsw table in the DSDPTR field.
(devsw: displays miscellaneous kernel data structures)

root@bb_lpar: /dev # echo devsw | kdb | grep dsdptr | grep -v 00000000
   dsdptr:    F1000A0600751800                                                 <--this will be used for "volgrp" command
   dsdptr:    05A50280
   dsdptr:    F1000A0600751400

root@bb_lpar: /dev # echo volgrp F1000A0600751800| kdb                         <--displays info about given volgrp
...
VOLGRP............. F1000A0600751800
vg_eyec............ 4C564D766F6C6772 (LVMvolgr)
vg_name............ rootvg
vg_ras_name........ rootvg
vg_id.............. 00080E820000D900000001335FBB8276
vg_lock.......... @ F1000A0600751868    vg_lock............ 0000000000000000
major_num.......... 0000000A            flags.............. 00040001
snapshot_copy...... 0000                partshift.......... 0012  (128M)
ltg_shift.......... 0001  (256K)        open_count......... 000A
max_lvs............ 0100                max_pvs............ 0020
....

------------------------------------

Check hcheck_interval value of a disk:

1. root@bb_lpar: / # echo lke | kdb | grep pcm
 59 F1000000A063D200 05A60000 00030000 02080242 /usr/lib/drivers/aixdiskpcmke      <--this shows slot number, what we can use (here 59)

2. root@bb_lpar: / # echo "lke -s 59" | kdb | grep le_data
  le_data........ 0000000005A80000   le_datasize.... 0000000000002828              <--this shows le_data value
                                                                                   (we will use this in adevq subbcommand)

3. root@bb_lpar: / # kdb
(0)> adevq
Unable to find <pcm_info>
Enter the pcm_info address (in hex): 0000000005A80000                              <--the above value is given here
NAME      ADDR               STATE MACHINE  ACTIVE_IO                              <--then we will see the list of hdisks
hdisk1    0xF1000A0600740400 0x0       0x       0                                  <--choose the address of a disk and run adevq against it
NAME      ADDR               STATE MACHINE  ACTIVE_IO
hdisk2    0xF1000A0600740E00 0x0       0x       0
NAME      ADDR               STATE MACHINE  ACTIVE_IO
hdisk3    0xF1000A0600741800 0x0       0x       0
NAME      ADDR               STATE MACHINE  ACTIVE_IO
hdisk0    0xF1000A0600742200 0x0       0x       0


4. (0)> adevq 0xF1000A0600740400 | grep hcheck                                     <--this shows the address of hcheck, what we will use
    hcheck_t &hcheck = 0xF1000A0600740470


5. (0)> ahcheck 0xF1000A0600740470 | grep interval
    uint interval = 0x0                                                            <--this shows hcheck_interval value in hex (we have 0)

------------------------------------

Check for a process which is using a specific network port:


1. root@bb_lpar: / # netstat -Aan | grep 22                                        <--check for address of the port
f1000e000330ebb8 tcp4       0      0  *.22   *.*    LISTEN


2. root@bb_lpar: / # kdb
(0)> sockinfo f1000e000330ebb8 tcpcb | grep pvproc                                 <--feed the addres in sockinfo subcommand (grep for pvproc)
pvproc+016000   88*sshd     ACTIVE 058000E 03A00A2 000000083846E480   0 0001

3. (0)> hcal 058000E                                                               <--calculate decimal value (this is the pid of the process)
Value hexa: 0058000E          Value decimal: 5767182

(0)> e                                                                             <--exit from kdb

4. root@bb_lpar: / # ps -fp 5767182                                                <--shows the process of a given pid
     UID      PID     PPID   C    STIME    TTY  TIME CMD
    root  5767182  3801250   0   May 09      -  0:00 /usr/sbin/sshd

------------------------------------

Checking vNIC adapters

# echo "vnic" | kdb
 +-------------------------------------------------+
|       pACS       | Device | Link |    State     |
|------------------+--------+------+--------------|
| F1000A00328C0000 |  ent1  |  Up  |     Open     |
|------------------+--------+------+--------------|
| F1000A00328E0000 |  ent2  |  Up  |     Open     |
+-------------------------------------------------+

------------------------------------

Check NFS statistics:

echo clio | kdb
(0)> clio
Server           IO count   IO Waiting   Max Wait(s)   Mount Point
"212.111.0.11"   0          0            0             "/nfsdata"

23 comments:

  1. There are not much info about kdb.
    Thanks.

    ReplyDelete
  2. This is great work on kdb.

    Is it possible to get the virtual Ethernet settings using kdb? like 'echo efscsi fscsi0 | kdb | grep' ?

    Thanks

    ReplyDelete
    Replies
    1. I could not find anything on that. Probably someone from an IBM Lab can give you the answer.

      Delete
  3. Thanks mate. I didn't see your reply since then. Now again I had a dead gateway detected problem, so I wanted to check if I received a reply from you. Thanks always :)

    ReplyDelete
  4. Hi!
    I'm interesting in getting size of DMA region used by vfc on AIX 7.1TL1. According to this document http://www.ibm.com/developerworks/wikis/download/attachments/104533522/AIX_Disk_IO_Tuning_093011.pdf AIX6.1TL2 and upper use 128Mb for NPIV vfc. Can I get it via kdb?
    Thanks

    ReplyDelete
    Replies
    1. Hi, I think someone from IBM can help you in this, probably you should open an IBM call. There are very few good publications about kdb, and I could not find this info there.

      Delete
  5. Hi, great article!, however my question is if kdb acts like KDB - meaning if kdb could freeze all processes:

    "When the KDB kernel debugger is invoked by a condition, it is the only running program. All other processes are stopped and processor interrupts are disabled."

    info from:
    http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.kdb%2Fdoc%2Fkdb%2Fkdb_debuggerintro.htm

    In other words. Is kdb safe for usage on, for instance - database environment?

    ReplyDelete
    Replies
    1. Hi, kdb command and KDB (Kernel Debugger) are 2 very different things, just their names are the same...thanks to IBM :). The command kdb can be used to analyze a running (!) system, so no processes will be stopped. I am not an expert of the command kdb, but I used it several times and it did not cause any issues. (To be on the safe side try on a test system first :)) I have never used the KDB, Kernel Debugger tool. As far as I know for that a special bosboot and reboot is needed and then you have to work from a terminal console.
      Here are 2 links how to invoke both of them:
      kdb: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.kdb/doc/kdb/kdbcommandinvoke.htm
      KDB: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.kdb/doc/kdb/kdbdebuggerinvoke.htm

      Delete
    2. Hello Guys, kdb command is very powerful t, but I suggest no to use it on active oracle DB server :) , try to test it first on non-production in several scenarios of consuming CPU,

      good luck!!

      You can find some information here:
      http://www.ibm.com/developerworks/aix/library/au-kdbsteps/

      Delete
  6. Hi, Is there any Kdb command to find the VIOS details on the VIO client running 5.3?

    ReplyDelete
    Replies
    1. Hi, all info what I know about KDB is on this page...check if it is there....

      Delete
    2. This comment has been removed by the author.

      Delete
    3. echo "cvscsi\ncvai"| kdb -script |grep vhost

      Delete
  7. Hi, Is there any Kdb command to find the VIOS details on the VIO client running 6.1?

    ReplyDelete
    Replies
    1. Check VSCSI adapter mapping:
      (run this on vio client, not on vio server)

      root@bb_lpar: / # echo "cvai" | kdb | grep vscsi <--cvai is a kdb subcommand
      read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01A83C0
      vscsi0 0x000007 0x0000000000 0x0 aix-vios1->vhost2 <--shows which vhost is used on which vio server for this client
      vscsi1 0x000007 0x0000000000 0x0 aix-vios1->vhost1
      vscsi2 0x000007 0x0000000000 0x0 aix-vios2->vhost2

      ------------------------------------

      Check NPIV adapter mapping:
      (run this on vio client, not on vio server)

      root@bb_lpar: / # echo "vfcs" | kdb <--vfcs is a kdb subcommand
      ...
      NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE
      fcs0 0xF1000A000033A000 0x0008 aix-vios1 vfchost8 0x01 0x0000 <--shows which vfchost is used on vio server for this client
      fcs1 0xF1000A0000338000 0x0008 aix-vios2 vfchost6 0x01 0x0000

      Delete
    2. Hello Guys, Sorry to bother & late in asking query from 2013..

      Just a simple query - 1] How can we see the VIOS Server's Full name in the column provided they are lengthy and it doesn't gets hided (or) over-written by next column value.. Thanks..
      Any quick response will be good..

      Delete
  8. NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE
    fcs0 0xF1000A002EA46000 0x0008 vios36_Cvfchost2 0x01 0x0000
    fcs1 0xF1000A002EA44000 0x0008 vios36_Cvfchost3 0x01 0x0000

    is there reference anywhere to show what the hexa value in the STATE column means?

    ReplyDelete
  9. The most comprehensive information I could find on kdb. Thank you!

    ReplyDelete
  10. AIX lpar is reaching in kbd state, while installing os.How to recover from it.
    *********************
    Special Regs:
    %IV: 00000700 %CR: 20000000 %XER: 00000018 %DSISR: 00000000
    %SRR0: 000000000ec71cb8 %SRR1: 0000000000081002
    %LR: 000000000ec71ca4 %CTR: 000000000ec71ca0
    %DAR: 0000000000000000
    Virtual PID = 1
    ok
    0 >
    ***************************

    ReplyDelete
  11. Appreciate Sir, thanks for the clear document

    ReplyDelete