HSM Alarm Codes
The Luna PCIe HSM 7 alarm messages indicate error conditions on the HSM card that might require user intervention. The alarms apply to a Luna HSM, compliant with security level FIPS 140-2 Level 3 . The alarm messages provide appropriate detail to alert HSM users of important events. Each alarm message has a unique character string for the message ID that allows higher level tools on the host system to parse for the alarm message IDs and generate notifications.
Messages are saved to the system log file in Linux host systems, allowing host application software like SNMP to parse the log file, and to the Windows Event Viewer in Windows host systems
Messages can be retrieved with the "dmesg" utility, to read messages from the driver log, which collects messages from the bootloader (BL), the firmware (FW), or from the Host Driver itself.
This section contains the following information:
>Alarm Generation and Handling
Alarm Generation and Handling
Alarm messages can be generated due to the HSM BootLoader, Firmware, and Host Driver Software detecting unexpected conditions. Other alarm messages are generated after unexpected interrupts or tamper events. For each of these problems detailed error information and an alarm message is output to notify the user that something special has happened.
At least one alarm message is output as a result of each tamper event by BL, FW, or Host Driver. Depending on the type of tamper all of them may report an alarm message related to the same tamper event. The message timestamps assist you to identify which alarm messages are for the same tamper event. Tamper alarm messages from BL, FW, and Host Driver have the same text description for the same tamper event. A specific type of tamper event is not reported again until FW clears the tamper information in the tamper circuit. If the tamper event happens after that, then either a new tamper condition has been detected or the same tamper event is still active and cannot be cleared.
Alarm Handling for Special Situations
Alarm messages are still generated during rare occurrences where BL, FW, or Host Driver might be in an abnormal state.
As long as the Host Driver is running, the BL and FW are able to output their alarm messages to the DLOG (driver log), which can be parsed to notify the user. If either BL or FW stops execution due to error detection, they output an alarm message to the Host Driver, which stores it in DLOG. All BL and FW checking for alarm conditions is stopped but all HW tamper event monitoring (soft and hard tampers) is still enabled including Host Driver monitoring. The card reset caused by these tampers restarts BL and possibly FW and the alarm messages are output. The following situations are also handled:
>BL starts before Host Driver is loaded (System power-up): Without Host Driver available, BL outputs all alarms only to an internal HSM log. When the Host Driver loads it resets the HSM card, causing BL to start again. BL can then send any new alarms to the host driver and either stop or proceed to FW, as the situation allows.
•For an L3 card if FW is started it will output alarm messages for any existing tamper conditions. Any tamper event alarm messages including those not sent out while the Host Driver was not loaded can be fetched from the FRAM Log.
NOTE If needed, use
>FW halted due to internal error: In order to get to FW the Host Driver must be running so the FW halted alarm message will be stored in DLOG. No further BL or FW alarm messages are generated in this state until the next card reset.
>FW in locked state (tamper clear required): An alarm message is generated to signal locked state is active. FW is still doing periodic checks and FW alarm messages are still possible. Only a small subset of FW commands is available.
>FW in Secure Transport Mode (STM): An alarm message is generated to signal STM is active. FW is still doing periodic checks and FW alarm messages are still possible. Only a small subset of FW commands are available.
>Host Driver loses communications with the HSM card: If the Host Driver has any errors communicating with the K7 (BL or FW) it will generate alarm messages. The Host Driver also periodically checks that the Luna PCIe HSM 7 card is still present on the PCIe bus (i.e. chassis open causes a cold reset of the HSM) and if there is no response for a pre-determined period of time an alarm message is generated.
FRAM LOG
The Boot Loader and firmware also store all alarm event information in the FRAM Log in the non-volatile FRAM device on the K7. There is no specific FRAM Log partition for DLOG or alarm messages. Use LUNADIAG to retrieve the FRAM Log contents and return it to Thales Customer Support for further analysis. In the event the Host Driver is unavailable to receive this information, it is still present in the FRAM Log and can be retrieved long after the alarm event has finished.
List of HSM Alarm Codes
ALM ID | Alarm Message | Description | Info |
---|---|---|---|
Host Driver | Tamper
Flag |
||
0001 | Soft tamper - over voltage | HSM voltage is above the operating range. HSM will stay in reset until voltage goes back in range. | HCCSR: VST |
0002 | Soft tamper - temperature (nnC) | HSM temperature (nn degrees Celsius) is outside the range (-2C to 80C). HSM will stay in reset until temperature goes back in range. | HRCSR: TST |
0003 | Soft tamper - indeterminate cause | A soft tamper occurred but cannot determine the cause. | |
0004 | Hard tamper - high temperature | HSM temperature is higher than 88C. | HT_T |
0005 | Hard tamper - low temperature | HSM temperature is lower than -40C | LT_T |
0006 | Hard tamper - over voltage | HSM voltage is higher than the maximum allowed. | OV_T, TC3_T |
0009 | Hard tamper - oscillator failure | HSM tamper clock oscillator has failed | OSC_T |
0010 | Decommission signal triggered | Decommission button (connector P9) has been pressed. | TC2_T |
0011 | Hard tamper - indeterminate cause | A hard tamper occurred but cannot determine the cause. | |
0012 | Hardware Error | Error detected in device hardware | |
0013 | High Temperature - nnC | HSM has reached nn degrees Celsius and needs to be cooled to avoid tampering | |
0014 | Low Battery | HSM battery voltage is below 2.75V and needs to be replaced soon. | |
0015 | PCIe Link Failure | HSM no longer appears on PCIe bus. Chassis may have been opened. | |
0016 | Device Error | Internal error detected during communications with HSM | |
0017 | Request Timed Out | Request to HSM took too long | |
Boot Loader | Tamper
Flag |
||
1000 | Unknown alarm ID xx in boot loader | Illegal alarm ID used in Boot Loader. | |
1001 | HSM restart required | Soft or hard tamper occurred. HSM needs to be restarted (reset) before firmware is allowed to run. | |
1003 | HSM halted - internal boot loader error | Boot Loader detected an error during diagnostics and did not jump to FW. | |
1004 | Warning - boot loader diagnostic error | Boot Loader detected an error during diagnostics that does not stop execution but needs to be investigated (i.e. fan, VPD, or RTC problems). | |
1005 | HSM FW signature check failed | The FW image on the HSM failed authentication and will not be executed. | |
1006 | Soft tamper temperature/voltage | HSM voltage or temperature is outside the acceptable range. HSM will stay in reset until back in range. | PORSM status reg. |
1007 | Hard tamper - high temperature | HSM voltage or temperature is outside the acceptable range. HSM will stay in reset until back in range. | HT_T |
1008 | Hard tamper - low temperature | HSM temperature is lower than -40C. | LT_T |
1009 | Hard tamper - over voltage | HSM voltage is higher than the maximum allowed. | OV_T, TC3_T |
1012 | Hard tamper - oscillator failure | HSM tamper clock oscillator has failed | OSC_T |
1013 | Hard tamper - tamper configuration invalid | HSM tamper configuration lost (set to defaults) due to power loss. | FS_T |
1014 | Chassis opened | Chassis open switch (connector P7) has been triggered. | TC1_T |
1015 | HSM removed from chassis | HSM was removed from host chassis then re-inserted | CS |
1016 | Decommission signal triggered | Decommission button (connector P9) has been pressed. | TC2_T |
Firmware | |||
2000 | Unknown alarm ID xx in firmware | Illegal alarm ID used in firmware. | |
2001 | High temperature warning activated | HSM temperature is above 75C (FW checks every 2 minutes). This warning will not re-appear unless temperature drops below 75C and goes back up again. | |
2002 | High temperature warning deactivated | HSM temperature has dropped below 75C. | |
2003 | Battery low voltage warning | Battery voltage is below 2.75V (FW checks every hour). This warning will not re-appear unless voltage goes above 2.75V then back down. Battery should to be replaced soon. | |
2004 | Battery depleted | Battery voltage is below 2.5V (FW checks every hour). HSM FW will be halted. Battery must to be replaced. | |
2005 | HSM deactivated | Auto-activation data has been cleared | |
2006 | HSM decommissioned by FW | All user crypto material has been invalidated due to KEK CRC failure, decommission signal, or tamper (if decommission on tamper enabled). | |
2007 | HSM zeroized | All user crypto material has been erased. HSM product credentials still exist. This can occur for a variety of reasons including manual zeroization. | |
2008 | Internal data corruption | Settings to control tamper monitoring are incorrect or Critical Security Parameter data (MTK) is invalid ( the tamper monitoring settings if incorrect are corrected. ). Otherwise there was an unexpected tamper security write protection change. | |
2009 | HSM halted - internal firmware error | FW detected an error which caused it to halt itself. Can also be errors generated by the kernel such as: bad exception, out of memory, unrecoverable errors. | |
2010 | HSM locked - tamper clear required | Limited set of FW commands available due to an HSM tamper condition. Tamper needs to be cleared before proceeding. Controlled tamper recovery must be enabled for this message to appear. | |
2011 | HSM unlocked - tamper clear done | Tamper was cleared when in controlled tamper recovery mode. | |
2012 | HSM in secure transport mode | Checked on every FW start-up to remind the user to do a recovery operation. Limited set of FW commands available. | |
2013 | HSM recovered from secure transport mode | HSM in secure transport mode was recovered back to normal mode. | |
2014 | Auto-activation data invalid – HSM deactivated | FW checked auto-activation data validity and failed. Re-activation required. | |
2015 | Hard tamper - high temperature | (L3 only) HSM temperature was higher than 88C. | HT_T |
2016 | Hard tamper - low temperature | (L3 only) HSM temperature was lower than -40C. | LT_T |
2017 | Hard tamper - over voltage | (L3 only) HSM voltage was higher than the maximum allowed. | OV_T, TC3_T |
2018 | Hard tamper - oscillator failure | (L3 only) HSM tamper clock oscillator has failed | OSC_T |
2019 | Hard tamper - tamper configuration invalid | (L3 only) HSM tamper configuration lost (set to defaults) due to power loss. | FS_T |
2020 | Chassis opened | Chassis open switch (connector P7) has been triggered. | TC1_T |
2021 | HSM was removed from chassis | HSM was removed from host chassis just before this FW execution. HSM will be deactivated. | CS |
2022 | Decommission signal triggered | Decommission button (connector P9) has been pressed. | TC2_T |
2023 | HSM fan x failure | Fault detected in HSM on-board fan (fan 1 or fan 2). | |
2024 | Stored data integrity verify error | Integrity of an object or CSP did not verify correctly. See Stored Data Integrity. | |
2025 | Firmware update in progress | A firmware update procedure is in progress. Recorded in the logs, but not shown onscreen. [ Added with firmware 7.7.0 ] | |
2026 | Firmware update canceled | A firmware update procedure was halted due to insufficient memory to continue - the HSM rolls back to the previous f/w. [ Added with firmware 7.7.0 ] | |
2027 | HSM storage exceeded | Attempt to use storage beyond the size of a partition (which was doubled with firmware 7.7.0) - the update proceeds to completion, but some restrictions apply to the affected partition -- see V0 and V1 Partitions. This is recorded only in the logs, not onscreen, but a message "HSM storage is currently over capacity" is shown onscreen. [ Added with firmware 7.7.0 ] | |
2028 | HSM capacity exceeded | Attempt to exceed the total memory size of the HSM cancels the operation. Refer to your backups. [ Added with firmware 7.7.0 ] |
HSM Alarm Code Samples
This section shows the details of some of the alarm event scenarios.
ALM = alarm message.
Temperature - High Warning
If HSM temperature reaches 75 degrees Celsius and then drops back below 75C the following actions occur:
>Temperature >= 75C
•After 5 minutes at this temperature or higher, the Host Driver receives a 'High Temperature Warning' interrupt and issues an ALM
•Firmware checks temperature at start-up and once per hour
•Firmware issues ALM for high temperature warning activated
>Temperature < 75C
•Firmware issues ALM for high temperature warning deactivated
Temperature – High Soft Tamper
When the temperature starts below 75C and reaches the high soft tamper limit of 80C and then drops back below 75C the following actions occur:
>Temperature >= 75C
•After 5 minutes at this temperature or higher, the Host Driver receives a High Temperature Warning interrupt and issues an ALM
•Firmware issues ALM for activation of high temperature warning
>Temperature >= 80C
•Soft Tamper reset – card put into reset. Stays in reset until temperature lowers.
•Host Driver receives soft tamper interrupt and issues ALM (only one when soft tamper condition starts).
>Temperature < 80C
•Bootloader issues soft tamper ALM, then an ALM that HSM restart is required and waits for host reset.
•User receives ALM and goes to LunaCM/Lunash to do an “hsm restart” command.
•Bootloader starts – jumps to firmware.
•Firmware starts – no actions taken for the soft tamper. If temperature >= 75C, firmware re-issues ALM for activation of high temperature warning.
>Temperature < 75C
•Firmware issues ALM for deactivation of high temperature warning.
Temperature – High Hard Tamper
When the temperature starts below 75C and reaches high hard tamper limit of 88C and then drops back below 75C the following actions occur:
>Same as soft tamper described above up to when card is held in soft tamper reset
>Temperature > 88C
•Hard Tamper reset – Card in hard tamper reset for 5 seconds then returns to soft tamper reset. K7 HW does erase/reset of all internal temporary memory. Tamper chip latches time and type of tamper. Host driver receives hard tamper interrupt and issues ALM.
•HSM also erases auto-activation and STM data in tamper chip
•If decommission on tamper is enabled then key encryption data is erased in tamper chip as well
>Temperature < 80C
•Bootloader starts – issues hard tamper ALM and logs it in FRAM Log
•Bootloader issues ALM that HSM restart is required and waits for host reset.
•User receives ALM and goes to LunaCM/Lunash to perform an hsm restart command.
•Bootloader starts – jumps to firmware.
•Firmware starts – saves hard tamper latches. If controlled tamper recovery is enabled, firmware locks HSM commands to a minimal subset only, and issues ALM for HSM locked. User must go to LunaCM/Lunash and perform a “tamper clear” command to get a full HSM command set. When tamper clear is issued, firmware outputs an ALM for HSM unlocked.
•Firmware – issues deactivation and decommission (if enabled for tamper) ALMs
•Firmware - temperature >= 75C, firmware re-issues ALM for activation of high temperature warning
>Temperature < 75C
•Firmware issues ALM for deactivation of high temperature warning
>Temperature < 80C
•Bootloader starts – issues hard tamper ALM
•Bootloader erases all of flash except for Boot Loader area and issues ALM for 'HSM permanently tampered'
•Bootloader issues ALM that 'HSM restart is required' and waits for host reset.
•User receives ALM and goes to LunaCM/Lunash to do an “hsm restart” command.
•Bootloader starts – Only bootloader commands are available. Bootloader again issues 'ALM for HSM permanently tampered'. User can dump the FRAM Log using LUNADIAG.
Hard Tampers During Storage
When the HSM is powered off its tamper detection is powered by the on-card battery. Some hard tampers can occur when main power is not applied. The condition that caused the tamper might not be present (for example high or low temperature) when the HSM is powered back on, while others might never turn off (for example enclosure penetration, oscillator failure). If they occur while in storage, then after the HSM is powered up, the bootloader runs and logs the tamper events in FRAM Log and the serial port. Since the host K7 driver has not started yet, none of the messages from the bootloader are sent to the host, but other alarm messages are output later to notify the user.
•Bootloader waits for the host driver to be loaded
•When the host driver starts up it immediately resets the HSM causing the bootloader to run again
•Bootloader does not re-log the same tamper events
•Bootloader jumps to firmware which outputs the ALM for the tamper event. If controlled tamper recovery is enabled firmware also outputs an ALM for the 'HSM is locked and a tamper clear is required'. The user can then use LunaCM or Lunash to clear the tamper
NOTE If needed, use
Decommission with power on
If the HSM is powered on and a decommission is triggered either by the decommission switch or by a tamper (if decommission on tamper is enabled) then the HSM goes into reset for 5 seconds. The following alarm messages are output to FRAM Log, serial port, and host driver:
>The host driver immediately receives an interrupt and outputs an 'ALM for decommission triggered'
>After 5 seconds lapses, the bootloader starts running and also outputs an 'ALM for decommission triggered'
>Bootloader outputs an ALM for 'HSM restart required' and then waits
>User gets alarm notification and performs an HSM restart
>Bootloader restarts and jumps to firmware which finishes the decommission operations and firmware outputs an ALM for 'HSM decommissioned by firmware' and an ALM for 'HSM locked' (if enabled)
Decommission with power off
If the HSM is powered off and a decommission is triggered either by the decommission switch or by a tamper (if decommission on tamper is enabled) then the decommission is latched in the tamper chip. When the HSM is powered on the following alarm messages are output:
>Bootloader starts running and outputs an ALM for 'Decommission triggered' only to FRAM Log and serial port since the host driver is not loaded yet
>Bootloader waits for the driver to be loaded which then forces a host reset
>Bootloader restarts and jumps to firmware which finishes the decommission operations and firmware outputs an ALM for 'HSM decommissioned by firmware' and an ALM for 'HSM locked' (if enabled)
NOTE If needed, use
Chassis open with power on
If the HSM is powered on and the chassis open switch triggered then a cold reset is performed on the HSM which effectively removes the HSM from the PCIe bus. After about 10 seconds the HSM is released from reset and the following alarm messages are output:
>Host Driver notices the device is no longer present on the PCIe bus and outputs an ALM for 'HSM missing from PCIe bus'
>Bootloader starts running and outputs an ALM for 'HSM chassis opened' only to FRAM Log and serial port
>Bootloader waits for the driver to be loaded
>User gets notification of missing HSM and powers off then on the host system
>Bootloader starts running and does not re-log the same tamper events
>Bootloader waits for the host driver to be loaded
>When the host driver starts up it immediately resets the HSM causing Bootloader to run again
>Bootloader jumps to firmware which finishes the chassis opened operations and firmware outputs an ALM for 'HSM chassis opened' and an ALM for 'HSM locked' (if enabled).
NOTE If the chassis is still open then the HSM performs a cold reset after the tampers are cleared by firmware.
If needed, use
Chassis open with power off
If the HSM is powered off and the chassis open switch triggered then the chassis open is latched in the tamper chip. When the HSM is powered on the following alarm messages are output:
>Bootloader starts running and outputs an ALM for 'HSM chassis opened' only to FRAM Log and serial port
>Bootloader waits for the driver to be loaded which then forces a host reset
>Bootloader starts running and does not re-log the same tamper events
>Bootloader jumps to firmware which finishes the chassis opened operations and firmware outputs an ALM for 'HSM chassis opened' and an ALM for 'HSM locked' (if enabled)
NOTE If the chassis is still open then the HSM performs a cold reset after the tampers are cleared by firmware.
Card removal
When an HSM is powered off and removed from the chassis a card removal latch is saved in the tamper chip. When the HSM is powered on the following alarm messages are output:
>Bootloader starts running and outputs an ALM for 'card removal' only to FRAM Log and serial port
>Bootloader waits for the driver to be loaded which then forces a host reset
>Bootloader starts running and does not re-log the same tamper events
>Bootloader restarts and jumps to firmware which outputs an ALM for 'HSM was removed from the chassis' and an ALM for 'HSM locked' (if enabled)
NOTE If needed, use
Stored Data Integrity
The HSM performs data integrity checks at startup and during runtime. severity
Startup
If a check fails during startup, meaning that an object stored in flash memory was corrupted, then ALM 2024 is generated, along with additional log messages, and the HSM firmware halts:
k7pf0: [HSM] ALM2024: Stored data integrity verify error
... additional messages that might include "LOG (SEVERE)" and "LOG (CRITICAL)", "Fatal error", and possibly also
k7pf0: [HSM] ALM2009: HSM halted - internal firmware error
What to do
1.Restart the HSM.
2.If the ALM persists, cycle the power to the HSM.
3.If the ALM persists, zeroize the HSM.
4.If the ALM persists, contact Support.
Runtime
If a check fails during runtime, meaning that an object stored in volatile memory was corrupted, then ALM 2024 is generated, along with log messages, and the HSM is unable to perform any actions that involve the corrupted object:
k7pf0: [HSM] ALM2024: Stored data integrity verify error
... additional messages that might include "LOG (SEVERE)"
What to do
1.Try restarting the HSM.
2.If an SDI alarm occurs during startup, see the section about "Startup", above.
3.If no SDI alarm occurs during startup, but an SDI alarm occurs later, contact Support.
Appliance reports out-of-service (OOS) code 30
Anything that halts the firmware (such as ALM_2004, ALM_2009, ALM_2026) results in an out-of-service code 30. Other critical events that halt the firmware include:
>failed self-test
>failure in the random number generator
>failure in integrity of the bootloader
>failure in integrity of the firmware
>failure in integrity of the HSM memory
Status codes for appliance and for contained cryptographic module
(This table concerns the Luna Network HSM 7 appliance, and content that is repeated on Front-panel LCD Display). It does not reflect the standalone Luna PCIe HSM 7 or the Luna USB HSM 7. This is because only some codes reflect cryptographic module status, while others are related to the condition of the network appliance product that contains, and provides network connectivity to, the crypto module.)
The statuses in the table, below, are displayed on the appliance front panel and are recorded in system logs that you can collect and parse remotely.
State |
Status |
Description |
---|---|---|
ISO |
0 |
In Service Operational. No trouble. |
60 |
In Service Operational. The eth0 interface is offline. Use lunash:> network show and lunash:> service statusnetwork to display more information about the status of the network interfaces. |
|
61 |
In Service Operational. The eth1 interface is offline. Use lunash:> network show and lunash:> service statusnetwork to display more information about the status of the network interfaces. |
|
62 |
In Service Operational. The eth2 interface is offline. Use lunash:> network show and lunash:> service statusnetwork to display more information about the status of the network interfaces. |
|
63 |
In Service Operational. The eth3 interface is offline. Use lunash:> network show and lunash:> service statusnetwork to display more information about the status of the network interfaces. |
|
80 |
In Service Operational. The STC service is not running. Use lunash:> service statusstc to display more information about the status of the STC service. |
|
95 |
In Service Operational. The webserver service is not running. The REST API is not available. Use lunash:> service statuswebserver to display more information about the status of the webserver service. |
|
100 |
In Service Operational. The SNMP service is not running. Use lunash:> service statussnmp to display more information about the status of the SNMP subsystem. |
|
OOS |
20 |
Out of Service. The NTLS service is not running. Use lunash:> service statusntls to display more information about the status of the NTLS service. |
25 |
Out of Service. The NTLS service is not bound to an Ethernet device. Use lunash:> service statusntls to display more information about the status of the NTLS service, and lunash:> syslog tail to view the system logs to help troubleshoot the issue. |
|
30 |
Out of Service. The HSM service has experienced one or more errors or critical events. Use lunash:> hsm information show and lunash:> syslog tail to help troubleshoot the issue. |
|
OFL |
50 |
Offline. None of the Ethernet interfaces are connected to the network. Use lunash:> network show to display more information about the status of the network, and lunash:> syslog tail to view the system logs to help troubleshoot the issue. NOTE Prior to Luna Appliance Software 7.8.3, this code is incorrectly displayed as OFT (see resolved issue LUNA-28763). |
IST |
70 |
In Service Trouble. The syslog service is not running. Use lunash:> service statussyslog to display more information about the status of the syslog service, and lunash:> syslog tail to view the system logs to help troubleshoot the issue. |
90 |
In Service Trouble. The SSH service is not running. Use lunash:> service statusssh to display more information about the status of the syslog service, and lunash:> syslog tail to view the system logs to help troubleshoot the issue. |
|
110 |
In Service Trouble. Hard disk utilization is too high. Use lunash:> syslog tarlogs to create a tar archive of the logs and then use pscp to transfer the log archive from the appliance to a remote computer for archiving. |