Alarms
Alarms are CipherTrust Manager's mechanism for notifying administrators when the state of the CipherTrust Manager or one of its nodes is not healthy or is not configured securely and should be investigated. Alarms can be fetched via the REST API, CLI, or the GUI.
Alarm states are listed in the following table:
State | Description |
---|---|
off | The alarm is inactive and does not need to be investigated. |
on | The alarm is active and should be investigated. It remains active until the condition that triggered it is not longer valid. |
unknown | The alarm's state could not be determined and should be investigated. Typically this occurs when the service responsible for triggering the alarm failed to communicate its state (e.g. the service is down). |
The CipherTrust Manager has the following built-in alarms:
Name | Severity | Trigger | Remediation |
---|---|---|---|
Disk Full | Critical | The root file system's 'used space' percentage exceeds the configured threshold of 80%. | Increase the node's disk/partition/file-system, or replace the node with a new node that has sufficient storage. |
NAE TLS Disabled | Critical | The system is started with NAE's interface mode configured to not use TLS <link to interfaces> . | Modify NAE's interface mode <link to interfaces> to one of the options that specifies TLS and restart <link to services> . |
HSM Offline | Critical | The system cannot access the HSM after more than 15 seconds. NOTE: If connectivity to the HSM is not restored after 5 minutes, all services are shut down until the HSM becomes available. | Restore connectivity to the HSM. |
Cluster Node Down | Critical | A node within a cluster is down. | Restore connection of the down node to the cluster. |
Cluster Node Certificate Expiration | Critical | An internal CipherTrust Manager certificate that is used for database access and clustering has expired, or will expire within 30 days. | Reboot the box when the alarm triggers. A new certificate is automatically generated after the reboot. |
Syslog Connection Offline | Critical | A Syslog connection goes offline. | Restore connectivity to the specified Syslog connection. |
License Violation | Critical | A Virtual CipherTrust Manager k170v instance has more than four CPUs assigned to it. | Upgrade to a valid k470 license, or assign four or fewer CPUs to the k170v. |
The CipherTrust Manager server also has dynamic alarms that are triggered based on audit record conditions. Refer to: Configuring alarm triggers based on an audit record
To list all alarms
$ ksctl alarms list
Example response:
{
"skip": 0,
"limit": 10,
"total": 4,
"resources": [
{
"name": "Disk Full",
"state": "off",
"state_change_at": "2018-07-03T18:30:31.430320038Z",
"description": "Disk full alarm triggers above 80% of capacity (currently at 48%)",
"severity": "critical"
},
{
"name": "NAE TLS Disabled",
"state": "off",
"state_change_at": "2018-07-03T18:30:57.28077419Z",
"description": "TLS is disabled on the NAE interface",
"severity": "critical"
},
{
"name": "HSM Offline",
"state": "off",
"state_change_at": "2018-07-05T20:02:58.326696594Z",
"description": "HSM is offline",
"severity": "critical"
}
{
"name": "Cluster Node Down",
"state": "on",
"state_change_at": "2018-07-05T20:02:58.326696594Z",
"description": "Current down node(s): 127.0.0.1:5555",
"severity": "critical"
}
]
}
To acknowledge that an alarm is under investigation:
$ ksctl alarms acknowledge -i 1650789b-e8e6-4915-91ab-b067e073f39f
To clear (i.e. turn off) an alarm:
$ ksctl alarms clear -i 1650789b-e8e6-4915-91ab-b067e073f39f