Alarms
Alarms are CipherTrust Manager's mechanism for notifying administrators to problems with the state of the CipherTrust Manager, one of its nodes, or select clients such as CTE. Alarms are raised based on Server or Client Records indicating CipherTrust Manager or a client is not healthy or is not configured securely. Alarms can be fetched via the REST API, CLI, or the GUI.
Alarm states are listed in the following table:
State | Description |
---|---|
off | The alarm is inactive and does not need to be investigated. |
on | The alarm is active and should be investigated. It remains active until the condition that triggered it is not longer valid. |
unknown | The alarm's state could not be determined and should be investigated. Typically this occurs when the service responsible for triggering the alarm failed to communicate its state (e.g. the service is down). |
The CipherTrust Manager has the following built-in alarms for server problems:
Name | Severity | Trigger | Remediation |
---|---|---|---|
Disk Full | Critical | The root file system's 'used space' percentage exceeds the configured threshold of 80%. | Increase the node's disk/partition/file-system, or replace the node with a new node that has sufficient storage. |
NAE TLS Disabled | Critical | The system is started with NAE's interface mode configured to not use TLS <link to interfaces> . | Modify NAE's interface mode <link to interfaces> to one of the options that specifies TLS and restart <link to services> . |
HSM Offline | Critical | The system cannot access the HSM after more than 15 seconds. NOTE: If connectivity to the HSM is not restored after 5 minutes, all services are shut down until the HSM becomes available. | Restore connectivity to the HSM. |
Cluster Node Down | Critical | A node within a cluster is down. | Restore connection of the down node to the cluster. |
Cluster Node Certificate Expiration | Critical | Automatic renewal failed for an internal CipherTrust Manager certificate that is used for database access. | Re-join the node to the cluster to retrieve a current cluster certificate. In the Loki audit records, look for the cluster node which logged the message "Cluster cert about to expire ..." followed by a date. This the node to re-join to the cluster. |
Syslog Connection Offline | Critical | A Syslog connection goes offline. | Restore connectivity to the specified Syslog connection. |
Deprecated TLS version Enabled | Critical | When TLSv1.0 or/and TLSv1.1 is set as minimum TLS version on the CipherTrust Manager for NAE-KMIP interface. | Set minimum TLS version to TLSv1.2 or higher. |
License Violation | Critical | A Virtual CipherTrust Manager k170v instance has more than four CPUs assigned to it. | Upgrade to a valid k470 license, or assign four or fewer CPUs to the k170v. |
License Expiration | Warning | One or more of the licenses is set to expire in fewer than 90 days. The description indicates the licenses and number of days until expiry. |
The CipherTrust Manager server also has dynamic alarms that are triggered based on server or client record conditions. Refer to: Configuring alarm triggers based on a record.
Consult documentation for a specific connector for information on interpreting client alarms.
To list all alarms
$ ksctl alarms list
Example response:
{
"skip": 0,
"limit": 10,
"total": 8,
"resources": [
{
"name": "License Violation",
"state": "off",
"triggeredAt": "2021-09-30T10:07:52.5529Z",
"description": "Alarm triggers if CPU count exceeds the limit allowed in the license",
"severity": "critical"
},
{
"name": "Cluster Node Certificate Expiration",
"state": "off",
"triggeredAt": "2021-09-30T10:07:52.547729Z",
"description": "Alarm triggers 30 days to certificate expiration (the certificate is currently valid for 2.0 years)",
"severity": "critical"
},
{
"name": "Cluster Node Down",
"state": "off",
"triggeredAt": "2021-09-30T10:07:52.541277Z",
"description": "Cluster nodes down alarm triggers when any node is down (currently all nodes are up)",
"severity": "critical"
},
{
"name": "Disk Full",
"state": "off",
"triggeredAt": "2021-09-30T10:07:47.535595Z",
"description": "Disk full alarm triggers above 80% of capacity (currently at 47%)",
"severity": "critical"
},
{
"name": "HSM Offline",
"state": "off",
"triggeredAt": "2021-09-30T10:07:44.730768Z",
"description": "HSM is offline",
"severity": "critical"
},
{
"name": "Deprecated TLS version Enabled",
"state": "unknown",
"triggeredAt": "2021-09-30T10:07:37.527468Z",
"description": "Deprecated TLS version(TLSv1.0/1.1) enabled on NAE/KMIP interface",
"severity": "critical"
},
{
"name": "KMIP Debug Logs Unmask Enabled",
"state": "off",
"triggeredAt": "2021-09-30T10:07:27.516841Z",
"description": "Unmasking of sensitive data for KMIP debug logs is enabled",
"severity": "critical"
},
{
"name": "NAE TLS Disabled",
"state": "off",
"triggeredAt": "2021-09-30T10:07:17.510478Z",
"description": "TLS is disabled on the NAE interface",
"severity": "critical"
}
]
}
To acknowledge that an alarm is under investigation:
$ ksctl alarms acknowledge -i 1650789b-e8e6-4915-91ab-b067e073f39f
To clear (i.e. turn off) an alarm:
$ ksctl alarms clear -i 1650789b-e8e6-4915-91ab-b067e073f39f