Sub-System Log Reference

An example might help to explain the sub-system log reference object of the appliance MIB. Assume a power supply fails. Via the intelligent platform monitoring interface, the ipmievd process learns of this failure and generates a log message. ipmievd sends the message to the rsyslogd process. In addition to writing a record to the messages log file, rsyslogd writes the record to the named pipe at which lsta is listening. lsta determines that this event is trap-worthy. The agent formats the necessary parameters and uses the net-snmp snmptrap command to send a notification for a SAFENET-APPLIANCE-MIB::powerSupplyAttentionNotify notification with a SAFENET-APPLIANCE-MIB::ssLogReference object to the configured SNMP V3 user.

See the following subsections for information on these types of traps:

>Fan Failure

>Power Supply Failure

>Motherboard Failure

>Disk Drive Failure

>NTLS Failure

>Crypto Failure

The trap handler receives the notification in a packet that includes the following example segment:

SAFENET-APPLIANCE-MIB::powerSupplyAttentionNotify, SAFENET-APPLIANCE-MIB::ssLogReference = [myLuna:192.168.0.101 / 2012 Feb 29 12:05:01 / messages / ipmievd[1234] / 1]

where

>myLuna is the hostname of the Luna Network HSM appliance

>192.168.0.101 is the IP address of the first Ethernet interface on the appliance

>messages is the log file that contains the event leading to the trap notification

>2012 Feb 29 12:05:01 is the date and timestamp recorded in the log file

>ipmievd[1234] is the process that logged the message.

>1 is a boolean that indicates whether the trap is for an assertion (1) or de-assertion (0) event.

This information gives you what is needed to identify the specific log entry that led to the trap. If you log into the appliance and look at the messages log file:

[myLuna] lunash:>syslog tail -logname messages

you see the following entry:

2012 Feb 29 12:05:01 myLuna  local4 notice  ipmievd[1234]: ***TEST : Power Supply sensor PSU2_Status    . - Failure detected Asserted

From this log message, you know that the second power supply unit has failed and you can dispatch a technician to investigate.

Note that the Luna appliance tags log messages generated by lunash:>sysconf snmp trap test with a ***TEST designator. This designator allows you to determine legitimate events from test events.

Fan Failure

lsta generates a trap for a fan failure based on the following conditions for the ipmievd process and with any of { “Fan1A”, “Fan1B”, “Fan2A”, “Fan2B”, “Fan3A”, “Fan3B”} in the body of the message:

>Body of log message contains the text “'Lower Critical going low” and the threshold reported represents an assert condition

>Body of log message contains the text “'Lower Non-recoverable going low” and the threshold reported represents an assert condition.

Recall from Threshold Events that IPMI reports assert and de-assert conditions. A “true” relationship to the (Reading xxxx < Threshold yyyy RPM) segment of the log message represents an assert event. A “false” relationship represents a de-assert event.

Fan failures correspond to the fanAttentionNotify NOTIFICATION-TYPE of the SAFENET-APPLIANCE-MIB.

Note that the Luna administrative shell prohibits the ‘<’ and ‘>’ characters as parameters. But some traps key off threshold readings that rely on this arithmetic comparator. To enable test log messages of this sort, use a “.lt” or “.gt” string in place of the ‘<’ or ‘>’ character in the formatted LunaSH command.

You can cut-and-paste the following examples in a LunaSH session to create test log messages that generate fan traps (the first, second, fourth and fifth examples create assert events; the third and sixth examples, de-assert events):

lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Fan sensor Fan3B          . Lower Critical going low  (Reading 0 .lt Threshold 2000 RPM)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Fan sensor Fan3B          . Lower Critical going low  (Reading 2000 .lt Threshold 2000 RPM)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Fan sensor Fan3B          . Lower Critical going low  (Reading 21000 .lt Threshold 2000 RPM)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Fan sensor Fan3B          . Lower Non-recoverable going low  (Reading 500 .lt Threshold 1000 RPM)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Fan sensor Fan3B          . Lower Non-recoverable going low  (Reading 1000 .lt Threshold 1000 RPM)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Fan sensor Fan3B          . Lower Non-recoverable going low  (Reading 5100 .lt Threshold 1000 RPM)"

Power Supply Failure

lsta generates a trap for a power supply failure based on the following conditions for the ipmievd process and with any of {” PSU1_Status”, "PSU1_+12V_value", "PSU1 Temp_value", "PSU1 FAN_value", “PSU2_Status”, "PSU2_+12V_value", "PSU2 Temp_value", "PSU2 FAN_value", “Power Supply”} in the body of the message:

>Body of log message contains the text “Failure detected Asserted”

>Body of log message contains the text “Failure detected Deasserted”

>Body of log message contains the text “Presence detected Asserted”

>Body of log message contains the text “Presence detected Deasserted”

>Body of log message contains the text “- Transition to Power Off”

>Body of log message contains the text “'Lower Non-recoverable going low” and the threshold reported represents an assert condition

>Body of log message contains the text “'Upper Non-recoverable going high” and the threshold reported represents an assert condition.

Power supply failures correspond to the powerSupplyAttentionNotify NOTIFICATION-TYPE of the SAFENET-APPLIANCE-MIB.

Here is text you can use to create power supply traps:

lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Power Supply sensor PSU2_Status    . - Failure detected Asserted"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Power Supply sensor PSU1_Status    . - Presence detected Deasserted"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Power Supply sensor - Transition to Power Off"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Voltage sensor PSU2_+12V_value. Upper Non-recoverable going high  (Reading 14.538 .gt Threshold 13.392 Volts)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Voltage sensor PSU2_+12V_value. Upper Non-recoverable going high  (Reading 12.538 .gt Threshold 13.392 Volts)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Voltage sensor PSU2_+12V_value. Lower Non-recoverable going low  (Reading 10.548 .lt Threshold 11.232 Volts)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Voltage sensor PSU2_+12V_value. Lower Non-recoverable going low  (Reading 12.548 .lt Threshold 11.232 Volts)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "PSU1 Temp_value. Upper Non-recoverable going high  (Reading 117 .gt Threshold 115 Degrees)"

Motherboard Failure

lsta generates a trap for a motherboard failure based on the following conditions for the ipmievd process and with any of { "CPU", "VRD", "PCH", "Inlet", "CHA DIMM 0", "CHA DIMM 1", "CHA DIMM 2", "CHB DIMM 0", "CHB DIMM 1", "CHB DIMM 2", "RAM TMax", "CPU_VCORE", "VBAT", "3VSB", "3VMain", "+5V", "+12V"} in the body of the message:

>Body of log message contains the text “'Lower Critical going low” and the threshold reported represents an assert condition

>Body of log message contains the text “Upper Critical going high” and the threshold reported represents an assert condition

>Body of log message contains the text “Upper Non-recoverable going high” and the threshold reported represents an assert condition.

Motherboard failures correspond to the motherboardAttentionNotify NOTIFICATION-TYPE of the SAFENET-APPLIANCE-MIB.

Here are examples to generate motherboard traps:

lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Voltage sensor VBAT           . Lower Critical going low  (Reading 1.63 .lt Threshold 2.80 Volts)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Voltage sensor VBAT           . Lower Critical going low  (Reading 3.30 .lt Threshold 2.80 Volts)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Temperature sensor CPU            . Upper Critical going high  (Reading 75 .gt Threshold 72 Degrees)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Temperature sensor CPU            . Upper Critical going high  (Reading 70 .gt Threshold 72 Degrees)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Temperature sensor CPU            . Upper Non-recoverable going high  (Reading 92 .gt Threshold 89 Degrees)"
lunash:>sysconf snmp trap test -logfacility local4 -loglevel notice -process ipmievd -message "Temperature sensor CPU            . Upper Non-recoverable going high  (Reading 85 .gt Threshold 89 Degrees)"

Disk Drive Failure

lsta generates a trap for a disk drive failure based on the following conditions for the smartd process:

>Severity of the message is “crit”.

Disk drive failures correspond to the diskDriveAttentionNotify NOTIFICATION-TYPE of the SAFENET-APPLIANCE-MIB.

Use the following text to create a disk drive trap:

lunash:>sysconf snmp trap test -logfacility daemon -loglevel crit -process smartd -message "Device: /dev/sda, Temperature 45 Celsius reached limit of 44 Celsius (Min/Max 31/49)" -pid

NTLS Failure

lsta generates a trap for an NTLS failure based on the following conditions for the NTLS process:

>Severity of the message is “err”.

>Severity of the message is “crit”.

NTLS failures correspond to the ntlsAttentionNotify NOTIFICATION-TYPE of the SAFENET-APPLIANCE-MIB.

Here are examples to create NTLS traps:

lunash:>sysconf snmp trap test -logfacility local5 -loglevel crit -process NTLS -message "error :  0xc0000002 : Unable to create a new connection. " -pid
lunash:>sysconf snmp trap test -logfacility local5 -loglevel crit -process NTLS -message "info : 0 : NTLS CRASH AND BURN! Stack dump saved to /var/log/ntls_bt_2012-02-29_12:05:01" -pid

Crypto Failure

For Luna SA 5/6, lsta generates a trap for a crypto failure – the internal HSM for Luna appliances – based on the following conditions:

>For the kernel process, body of log message contains the text “HSM crashed:”

>For the sysstatd process, body of log message contains the text “30”

For Luna SA 7, lsta generates a trap for a crypto based on the following conditions:

>For the kernel process, body of log message contains the text “ALM”

Crypto failures correspond to the cryptoAttentionNotify NOTIFICATION-TYPE of the SAFENET-APPLIANCE-MIB.

Use the following examples to simulate a crypto failure on a Luna SA 5/6 appliance:

lunash:>sysconf snmp trap test -logfacility kern -loglevel info -process kernel -message "NOTE: viper0: hsm log: LOG(CRITICAL)  HSM crashed:"
lunash:>sysconf snmp trap test -logfacility user -loglevel info -process sysstatd -message "Luna System State Server - OOS Errors: 30,100,60!"

Use the following examples to simulate a crypto failure on a Luna SA 7 appliance:

lunash:>sysconf snmp trap test -logfacility kern -loglevel info -process kernel -message "k7pf0: [HSM] ALM2007: HSM zeroized"