KDB:
The KDB kernel debugger and the kdb command are useful for debugging device drivers, kernel extensions, and the kernel itself. Although they appear similar, the KDB kernel debugger and the kdb command are two separate tools:
KDB KERNEL DEBUGGER
It is integrated into the kernel and allows full control of the system while a debugging session is in progress.
KDB COMMAND:
It is implemented as an ordinary user-space program and can be used for analyzing the following:
1. A running system: When used to analyze a running system, the kdb command opens the /dev/pmem special file, which allows direct access to the system's physical memory. The kdb command performs its own address translation internally using the same algorithms as the KDB kernel debugger.
2. A system dump file produced by a previously crashed-system: A system dump contains certain critical data structures. Only the memory belonging to the process that was running on the processor that created the dump image can be included in the dump file. When you work with a system dump, any subcommands that modify memory are not valid because the system dump is merely a snapshot of the real memory in a system.
When you are analyzing a system dump file, the kdb command must be started with arguments that specify the location of the dump file and the kernel file:
# kdb /var/adm/ras/vmcore.0 /unix
(The kernel file is used by the kdb command to resolve symbol names from the dump file.)
------------------------------------
A very valuable benefit of kdb, that a device setting stored in ODM (lsattr..) can be compared with the realtime value used in running kernel with kdb!!
------------------------------------
KDB COMMAND:
help display context lists subcommands with the context "display"
p -? list of parameters for the p subcommand and a brief description
! <command> shell escape (provides a convenient way to run UNIX commands without leaving kdb)
hi print history
lke list loaded extensions
pvol -M <major> -m <minor> display physical volume info
stat system status info
status processor status
e exit from kdb
------------------------------------
echo vfcs fcs0 | kdb | grep num_cmd_elems shows num_cmd_elems in hex on VIO client with NPIV (compare with odm: lsattr -El fcs0)
(if you change num_cmd_elems with chdev, you can check in kdb if it really has been changed)
echo scsidisk hdisk0 | kdb | grep queue_depth shows real-time value in hex of queue_depth of given disk
------------------------------------
Checking how many virtual processors are active:
root@bb_lpar:/ # echo vpm | kdb <--vpm is a kdb subcommand
...
0 0 ACTIVE 0 AWAKE 0000000000000000 00000000 00
1 0 ACTIVE 0 AWAKE 0000000000000000 00000000 00
2 0 ACTIVE 0 AWAKE 0000000000000000 00000000 00
3 0 ACTIVE 0 AWAKE 0000000000000000 00000000 00
4 0 DISABLED 0 AWAKE 0000000000000000 00000000 00 <--earlier smt=4 checked, 1 Virtual Proc. is DISABLED (folding)
5 11 DISABLED 0 SLEEPING 00000000515B4478 29DBE3CA 02
6 11 DISABLED 0 SLEEPING 00000000515B4477 2C029174 02
7 11 DISABLED 0 SLEEPING 00000000515B4477 2C0292A1 02
------------------------------------
Check VSCSI adapter mapping:
(run this on vio client, not on vio server)
root@bb_lpar: / # echo "cvai" | kdb | grep vscsi <--cvai is a kdb subcommand
read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01A83C0
vscsi0 0x000007 0x0000000000 0x0 aix-vios1->vhost2 <--shows which vhost is used on which vio server for this client
vscsi1 0x000007 0x0000000000 0x0 aix-vios1->vhost1
vscsi2 0x000007 0x0000000000 0x0 aix-vios2->vhost2
------------------------------------
Check NPIV adapter mapping:
(run this on vio client, not on vio server)
root@bb_lpar: / # echo "vfcs" | kdb <--vfcs is a kdb subcommand
...
NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE
fcs0 0xF1000A000033A000 0x0008 aix-vios1 vfchost8 0x01 0x0000 <--shows which vfchost is used on vio server for this client
fcs1 0xF1000A0000338000 0x0008 aix-vios2 vfchost6 0x01 0x0000
------------------------------------
Check physical FC adapter setting (not in virtual environment):
(dyntrk, fc_err_recov, num_cmd_elems)
These are the settings what we would like to verify:
----------
root@bb_lpar: / # lsattr -El fscsi0| egrep 'dyntrk|fc_err_recov'
dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
root@bb_lpar: / # lsattr -El fcs0| grep num_cmd_elems
num_cmd_elems 200 Maximum number of COMMANDS to queue to the adapter True
----------
Verifying the settings from kernel:
1. root@bb_lpar: / # echo efscsi fscsi0 | kdb | grep efscsi_ddi
struct efscsi_ddi ddi = 0xF1000A06007FA080 <--this hexa value will be used
2. root@bb_lpar: / # echo dd 0xF1000A06007FA080+20 2 | kdb <--"+20 2" should be added to the above hexa value
... (20 is a reserved number)
F1000A06007FA0A0: 0101020202010200 000000B400000028 ...............( <--on the specified locations you can decode the numbers there
FFDD NNNNNNNN
FF = fc_error_recov:(we have "02" in this example here, which is fast_fail)
01 = delayed_fail
02 = fast_fail
DD = dyntrk: (we have "01" in this example here, which means "yes")
00 = disabled (no)
01 = enabled (yes)
NNNN = num_cmd_elems: (we have "B4" in this example here, but some calculation is still needed)
1. change to decimal value: 000000B4 --> 180
2. add 20 to the decimal number: 180 + 20 = 200
(you must always add "20" to the decimal value you get)
------------------------------------
Volume group and lv info:
The volgrp subcommand displays information about vg and its lvs.
The volgrp structure addresses are registered in the devsw table in the DSDPTR field.
(devsw: displays miscellaneous kernel data structures)
root@bb_lpar: /dev # echo devsw | kdb | grep dsdptr | grep -v 00000000
dsdptr: F1000A0600751800 <--this will be used for "volgrp" command
dsdptr: 05A50280
dsdptr: F1000A0600751400
root@bb_lpar: /dev # echo volgrp F1000A0600751800| kdb <--displays info about given volgrp
...
VOLGRP............. F1000A0600751800
vg_eyec............ 4C564D766F6C6772 (LVMvolgr)
vg_name............ rootvg
vg_ras_name........ rootvg
vg_id.............. 00080E820000D900000001335FBB8276
vg_lock.......... @ F1000A0600751868 vg_lock............ 0000000000000000
major_num.......... 0000000A flags.............. 00040001
snapshot_copy...... 0000 partshift.......... 0012 (128M)
ltg_shift.......... 0001 (256K) open_count......... 000A
max_lvs............ 0100 max_pvs............ 0020
....
------------------------------------
Check hcheck_interval value of a disk:
1. root@bb_lpar: / # echo lke | kdb | grep pcm
59 F1000000A063D200 05A60000 00030000 02080242 /usr/lib/drivers/aixdiskpcmke <--this shows slot number, what we can use (here 59)
2. root@bb_lpar: / # echo "lke -s 59" | kdb | grep le_data
le_data........ 0000000005A80000 le_datasize.... 0000000000002828 <--this shows le_data value
(we will use this in adevq subbcommand)
3. root@bb_lpar: / # kdb
(0)> adevq
Unable to find <pcm_info>
Enter the pcm_info address (in hex): 0000000005A80000 <--the above value is given here
NAME ADDR STATE MACHINE ACTIVE_IO <--then we will see the list of hdisks
hdisk1 0xF1000A0600740400 0x0 0x 0 <--choose the address of a disk and run adevq against it
NAME ADDR STATE MACHINE ACTIVE_IO
hdisk2 0xF1000A0600740E00 0x0 0x 0
NAME ADDR STATE MACHINE ACTIVE_IO
hdisk3 0xF1000A0600741800 0x0 0x 0
NAME ADDR STATE MACHINE ACTIVE_IO
hdisk0 0xF1000A0600742200 0x0 0x 0
4. (0)> adevq 0xF1000A0600740400 | grep hcheck <--this shows the address of hcheck, what we will use
hcheck_t &hcheck = 0xF1000A0600740470
5. (0)> ahcheck 0xF1000A0600740470 | grep interval
uint interval = 0x0 <--this shows hcheck_interval value in hex (we have 0)
------------------------------------
Check for a process which is using a specific network port:
1. root@bb_lpar: / # netstat -Aan | grep 22 <--check for address of the port
f1000e000330ebb8 tcp4 0 0 *.22 *.* LISTEN
2. root@bb_lpar: / # kdb
(0)> sockinfo f1000e000330ebb8 tcpcb | grep pvproc <--feed the addres in sockinfo subcommand (grep for pvproc)
pvproc+016000 88*sshd ACTIVE 058000E 03A00A2 000000083846E480 0 0001
3. (0)> hcal 058000E <--calculate decimal value (this is the pid of the process)
Value hexa: 0058000E Value decimal: 5767182
(0)> e <--exit from kdb
4. root@bb_lpar: / # ps -fp 5767182 <--shows the process of a given pid
UID PID PPID C STIME TTY TIME CMD
root 5767182 3801250 0 May 09 - 0:00 /usr/sbin/sshd
------------------------------------
Checking vNIC adapters
# echo "vnic" | kdb
+-------------------------------------------------+
| pACS | Device | Link | State |
|------------------+--------+------+--------------|
| F1000A00328C0000 | ent1 | Up | Open |
|------------------+--------+------+--------------|
| F1000A00328E0000 | ent2 | Up | Open |
+-------------------------------------------------+
------------------------------------
Check NFS statistics:
echo clio | kdb
(0)> clio
Server IO count IO Waiting Max Wait(s) Mount Point
"212.111.0.11" 0 0 0 "/nfsdata"
There are not much info about kdb.
ReplyDeleteThanks.
:)
DeleteThis is great work on kdb.
ReplyDeleteIs it possible to get the virtual Ethernet settings using kdb? like 'echo efscsi fscsi0 | kdb | grep' ?
Thanks
I could not find anything on that. Probably someone from an IBM Lab can give you the answer.
DeleteThanks mate. I didn't see your reply since then. Now again I had a dead gateway detected problem, so I wanted to check if I received a reply from you. Thanks always :)
ReplyDeleteWelcome...I appreciate any comments :)
DeleteHi!
ReplyDeleteI'm interesting in getting size of DMA region used by vfc on AIX 7.1TL1. According to this document http://www.ibm.com/developerworks/wikis/download/attachments/104533522/AIX_Disk_IO_Tuning_093011.pdf AIX6.1TL2 and upper use 128Mb for NPIV vfc. Can I get it via kdb?
Thanks
Hi, I think someone from IBM can help you in this, probably you should open an IBM call. There are very few good publications about kdb, and I could not find this info there.
DeleteHi, great article!, however my question is if kdb acts like KDB - meaning if kdb could freeze all processes:
ReplyDelete"When the KDB kernel debugger is invoked by a condition, it is the only running program. All other processes are stopped and processor interrupts are disabled."
info from:
http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.kdb%2Fdoc%2Fkdb%2Fkdb_debuggerintro.htm
In other words. Is kdb safe for usage on, for instance - database environment?
Hi, kdb command and KDB (Kernel Debugger) are 2 very different things, just their names are the same...thanks to IBM :). The command kdb can be used to analyze a running (!) system, so no processes will be stopped. I am not an expert of the command kdb, but I used it several times and it did not cause any issues. (To be on the safe side try on a test system first :)) I have never used the KDB, Kernel Debugger tool. As far as I know for that a special bosboot and reboot is needed and then you have to work from a terminal console.
DeleteHere are 2 links how to invoke both of them:
kdb: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.kdb/doc/kdb/kdbcommandinvoke.htm
KDB: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.kdb/doc/kdb/kdbdebuggerinvoke.htm
Hello Guys, kdb command is very powerful t, but I suggest no to use it on active oracle DB server :) , try to test it first on non-production in several scenarios of consuming CPU,
Deletegood luck!!
You can find some information here:
http://www.ibm.com/developerworks/aix/library/au-kdbsteps/
Hi, Is there any Kdb command to find the VIOS details on the VIO client running 5.3?
ReplyDeleteHi, all info what I know about KDB is on this page...check if it is there....
DeleteThis comment has been removed by the author.
Deleteecho "cvscsi\ncvai"| kdb -script |grep vhost
Deletekösz :-)
DeleteHi, Is there any Kdb command to find the VIOS details on the VIO client running 6.1?
ReplyDeleteCheck VSCSI adapter mapping:
Delete(run this on vio client, not on vio server)
root@bb_lpar: / # echo "cvai" | kdb | grep vscsi <--cvai is a kdb subcommand
read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01A83C0
vscsi0 0x000007 0x0000000000 0x0 aix-vios1->vhost2 <--shows which vhost is used on which vio server for this client
vscsi1 0x000007 0x0000000000 0x0 aix-vios1->vhost1
vscsi2 0x000007 0x0000000000 0x0 aix-vios2->vhost2
------------------------------------
Check NPIV adapter mapping:
(run this on vio client, not on vio server)
root@bb_lpar: / # echo "vfcs" | kdb <--vfcs is a kdb subcommand
...
NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE
fcs0 0xF1000A000033A000 0x0008 aix-vios1 vfchost8 0x01 0x0000 <--shows which vfchost is used on vio server for this client
fcs1 0xF1000A0000338000 0x0008 aix-vios2 vfchost6 0x01 0x0000
Hello Guys, Sorry to bother & late in asking query from 2013..
DeleteJust a simple query - 1] How can we see the VIOS Server's Full name in the column provided they are lengthy and it doesn't gets hided (or) over-written by next column value.. Thanks..
Any quick response will be good..
NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE
ReplyDeletefcs0 0xF1000A002EA46000 0x0008 vios36_Cvfchost2 0x01 0x0000
fcs1 0xF1000A002EA44000 0x0008 vios36_Cvfchost3 0x01 0x0000
is there reference anywhere to show what the hexa value in the STATE column means?
The most comprehensive information I could find on kdb. Thank you!
ReplyDeleteAIX lpar is reaching in kbd state, while installing os.How to recover from it.
ReplyDelete*********************
Special Regs:
%IV: 00000700 %CR: 20000000 %XER: 00000018 %DSISR: 00000000
%SRR0: 000000000ec71cb8 %SRR1: 0000000000081002
%LR: 000000000ec71ca4 %CTR: 000000000ec71ca0
%DAR: 0000000000000000
Virtual PID = 1
ok
0 >
***************************
Appreciate Sir, thanks for the clear document
ReplyDelete