dropdown menu

HW - ERRPT

Errpt - Diag - Alog

ERROR LOGGING:

The errdemon is started during system initialization and continuously monitors the special file /dev/error for new entries sent by either the kernel or by applications. The label of each new entry is checked against the contents of the Error Record Template Repository, and if a match is found, additional information about the system environment or hardware status is added. A memory buffer is set by the errdemon process, and newly arrived entries are put into the buffer before they are written to the log to minimize the possibility of a lost entry. The errlog file is a circular log, storing as many entries as can fit within its defined size, the default is /var/adm/ras/errlog and it is in binary format

The name and size of the error log file and the size of the memory buffer may be viewed with the errdemon command:

# /usr/lib/errdemon -l

Log File                /var/adm/ras/errlog
Log Size                1048576 bytes
Memory Buffer Size      32768 bytes

------------------------------

/usr/lib/errdemon               restarts the errdemon program
/usr/lib/errstop                stops the error logging daemon initiated by the errdemon program
/usr/lib/errdemon -l            shows information about the error log file (path, size)
/usr/lib/errdemon -s 2000000    changes the maximum size of the error log file

errpt                           retrieves the entries in the error log
errpt -a -j AA8AB241            shows detailed info about the error (with -j, the error id can be specified)
errpt -s 1122164405 -e 11231000405
                                shows error log in a time period (-s start date, -e end date)
errpt -d H                      shows hardware errors (errpt -d S: software errors)

Error Classes:
    H: Hardware
    S: Software
    O: Operator
    U: Undetermined

Error Type:
    P: Permanent - unable to recover from error condition
       Pending - it may be unavailable soon due to many errors
       Performance - the performance of the device or component has degraded to below an acceptable level
    T: Temporary - recovered from condition after several attempts
    I: Informational
    U: Unknown - Severity of the error cannot be determined


Types of Disk Errors:
DISK_ERR1: Disk should be replaced it was used heavily
DISK_ERR2: caused by loss of electrical power
DISK_ERR3: caused by loss of electrical power
DISK_ERR4: indicates bad blocks on the disk (if more than one entry in a week replace disk)


errclear                  deletes entries from the error log (smitty errclear)
errclear 7                deletes entries older than 7 days (0 clears all messages)
errclear -j CB4A951F 0    deletes all the messages with the specified ID              
errlogger                 log operator messages to the system error log
                          (errlogger "This is a test message")


------------------------------

Mail notification via errpt and errnotify

AIX has an Error Notification object class in the Object Data Manager (ODM). An errnotify object is a "hook" into the error logging facility that causes the execution of a program whenever an error message is recorded. By default, there are a number of predefined errnotify entries, and each time an error is logged via errlog, it checks if that error entry matches the criteria of any of the Error Notification objects.

0. make sure mail sending is working correctly from the server
1. create a text file (i.e. /tmp/errnotify.txt), which will be added to ODM


Add the below lines if you want notifications on all kind of errpt entries:

errnotify:
  en_name = "mail_all_errlog"
  en_persistenceflg = 1
  en_method = "/usr/bin/errpt -a -l $1 | mail -s \"errpt $9 on `hostname`\" aix4adm@gmail.com"
        <--specify here the email addres


Add the below lines if you want notifications on permanent hardware entries only:

errnotify:
  en_name = "mail_perm_hw"
  en_class = H
  en_persistenceflg = 1
  en_type = PERM
  en_method = "/usr/bin/errpt -a -l $1 | mail -s \"Permanent hardware errpt $9 on `hostname`\" aix4adm@gmail.com"



2. root@bb_lpar: / # odmadd /tmp/errnotify.txt                                 <--add the content of the text file to ODM:
3. root@bb_lpar: / # odmget -q en_name=mail_all_errlog errnotify               <--check if it is added successfully
4. root@bb_lpar: / # errlogger "This is a test message"                        <--check mail notification with a test errpt entry

You can delete the addded errnotify object if it is not needed anymore:
root@bb_lpar: / # odmdelete -q 'en_name=mail_all_errlog' -o errnotify
0518-307 odmdelete: 1 objects deleted.

(source: http://www.kristijan.org/2012/06/error-report-mail-notifications-with-errnotify/)

--------------------------------------------------------------------------------------------

DIAGRPT: (DIAG logs reporter)

diagrpt                   Displays previous diagnostic results
cd /usr/lpp/diag*/bin
    ./diagrpt -r          Displays the short version of the Diagnostic Event Log
    ./diagrpt -a          Displays the long version of the Diagnostic Event Log



--------------------------------------------------------------------------------------------

ALOG:

/var/adm/ras             this directory contains the master log files (alog command can read these files)
                         e.g. /var/adm/ras/conslog

alog -L                  shows what kind of logs there are (console, boot, bosinst...), these can be used by: alog -of ...
alog -Lt <type>          shows the attibute of a type (console, boot ...): size, path to logfile...
alog -ot console         lists of those errors which are on the console
alog -ot boot            shows the bootlog
alog -ot lvmcfg          lvm log file, shows what lvm commands were used (alog -ot lvmt: shows lvm commands and libs)


--------------------------------------------------------------------------------------------

21 comments:

basanth said...

how to maintain log file in aix ??

aix said...

With errclear you can make smaller the error log, for other files you can read the end of this link how to make them smaller: http://aix4admins.blogspot.hu/2011/05/superblock-in-jfs-superblock-is-first.html

Anonymous said...

Hey one doubt , Why do we have errpt and syslog both in the system cant we wor with any one of them. As such what is the difference between the two.

aix said...

Hey! Syslog is a "system logging facility". It does not contain only error messages, but some information about logins/logouts (with ftp/ssh) and warnings and info messages. It can be tailormade what you would like to see there. In errpt there are mainly error reports and some Hardware related errors are showing there first. However there is a possibility to transfer errors from errpt to syslog as well.

Hope this helps,
Balazs

Anonymous said...

I get the below error message during I/O in aix6.1 as well as aix7.1
"stderr: Cannot set primary dump device, ERROR: /dev/sysdump. Error code 0xc.. "
I have increased the primary dump space also,but still i dont find any changes,

what changes can be done. I am a beginner in aix ,Please provide solution.

Anonymous said...

Hi,

Can you please see the below errors from AIX syslogs. Is this a issue? How can I avoid these errors with out affecting my LPAR.

Detail Data
SYSLOG MESSAGE
<27>Aug 23 08:24:28 syslog: slp: 0660-084 [3473530] The SA failed to decode and compute received message: Parse Error (-2).

Detail Data
SYSLOG MESSAGE
<27>Aug 23 08:24:28 syslog: slp: [3473530] decode_srvreg -- __srv_reg_local failed with rc = -2.


Detail Data
SYSLOG MESSAGE
<27>Aug 23 08:24:28 syslog: slp: 0660-065 [3473530] Impossible to parse attribute (ca-uid=file:///var/opt/tivoli/ep/runtime/agent),(am-host=),(ca-ips=10.xx.xx.xxx),(ca-basic-port=9510),(ca-cert-port=

Anonymous said...

What does sysdumpdev -l show?

Anonymous said...

Known issue and apar available from IBM

Anonymous said...

Hello,

Regarding to syslogd. Would you be so kind and write if IBM is supporting any solution for sending logs via SSL or TLS to remote server.
I am looking for solution for sending encrypted logs to remote server RHEL

Unknown said...

Hello,
when i run errpt command their are no errors showing from past 6 months and my daemons working properly and how to find im getting correct report?
Thank you

aix said...

Hi, do a test error message with command "errlogger" (errlogger "This is a test message")

-Balazs

Anonymous said...

i test with errlogger as well still how to find i am getting correct error report?

Thankyou.

Anonymous said...

Hi,

even i saw the same kind of messages in my logs. Can you please tell me the APAR details to fix it?

bogo said...

Hi, is there a way to recover entries in the errpt that were accidentally cleared previously?

Unknown said...
This comment has been removed by the author.
Unknown said...

what is nfs how to sharing the files plz explain me any body

venkatesh said...

Hi Admin,

Am not any output, when am runnin the errpt coammnd in the AIX XXXX 3 5 00C7E35A4C00 server. Could you please help me in this we have task reboot of this server.


Quick response will be appreciated


Thanks,

Venkat

Unknown said...

Thanks Mr. Author for compiling this post. Successfully tested errpt notification by following above steps.

Unknown said...

while monitoring i found 65DE6DE3 0328181316 P S hdisk30 REQUESTED OPERATION CANNOT BE PERFORMED error in errpt command output.

Anyone help on this is error is appreciated.

Anonymous said...

Whether you got any fix to this

Anonymous said...

Hi,

What is servermon logs? Is it different from the OS logs?