Clusters and Nodes
Note
This feature is not included in Community Edition and requires a valid Virtual CipherTrust Manager license. To activate your instance with a trial evaluation, or a term or perpetual license, see Licensing.
Clusters and nodes are the resources used to create and manage CipherTrust Manager clustering.
A cluster is a group of connected CipherTrust Manager appliances that share data. A cluster node is an individual appliance within the cluster.
The main purpose of clustering is to support High Availability. When clustered, all CipherTrust Manager appliances in the cluster (called cluster nodes or nodes) continuously synchronize their databases with each other - any appliance in the cluster may be contacted for any operation. A cluster supports any combination of appliance models; both virtual and physical appliances can join the same cluster. However, all cluster nodes must be the same firmware version.
Note
Cluster Node Limitation - The maximum number of cluster nodes is 20. Joining a 21st node will fail.
If a member of the cluster gets temporarily disconnected, an alarm is set and syslog messages are sent. When the node comes back it synchronizes with other members of the cluster. If a member of the cluster gets permanently disconnected, other members should be notified by sending a ksctl cluster nodes delete
command to one of them - the others will tell each other. This prevents the remaining nodes from continually storing catch-up changes for the missing node, which will eventually fill up their local volumes.
Nodes in a cluster communicate over port 5432. Members of the cluster must have bi-directional access to each other on port 5432.
Note
For proper cluster functionality, every node must be synchronized to the same time. Otherwise, communication between nodes fails. You must manually configure the same NTP server for each node. The NTP server configuration is not replicated, so you must configure each node independently.
Use the ksctl command line interface for managing clusters and nodes. The relevant commands are:
ksctl cluster csr
ksctl cluster delete
ksctl cluster fulljoin
ksctl cluster info
ksctl cluster join
ksctl cluster new
ksctl cluster nodes list
ksctl cluster nodes get
ksctl cluster nodes create
ksctl cluster nodes delete
Note
All global flags can be configured using a configuration file. Otherwise, you will need to specify the URL, username and password for each call as demonstrated in a few examples below.
Note
The cluster commands require an IP address (or Hostname) of both the cluster member and of the joining node. It is important to note that these will be used by the nodes to talk to each other, and not by the CLI to talk to the nodes. For example, in AWS if the nodes are on the same VPC, the internal IP address of the member and joining node should be used, not a public IP.
What is not clustered?
Most things are clustered, however the following node specific items are not clustered, that is, are not replicated across the cluster.
Backup files
Backup keys
Debug, KMIP Activity, and NAE Activity logs
NTP configuration
HSM configuration
Instance name
Virtual CipherTrust Manager license
Interface certificates
Proxy settings
Note
The 'Connector Lock Code', unlike the 'Key Manager Lock Code', is cluster-wide. This means that a license applied using the Connector Lock Code is replicated across the cluster.
Sharing HSM Root of Trust Configuration
CipherTrust Manager internally uses a flexible security architecture to allow clustering instances with both shared and distinct 'root of trust' configurations.
If the member node and the new node share the same HSM partition, it is recommended to pass the flag --shared-hsm-partition
during nodes create or fulljoin, and to set each node to use the same Root of Trust (RoT) key using ksctl rot-keys rotate --id <root-of-trust-id>
. This configuration further increases the security of cluster join procedure by ensuring that CipherTrust Manager secrets and master keys are protected by an HSM-secured RoT key during the join process, when these objects are transferred between nodes for the first time.
When the new node uses a distinct HSM partition or does not use any HSM, it is still possible to join it to an existing cluster by not specifying the --shared-hsm-partition
flag.
Caution
Each CipherTrust Manager node in the cluster uses its own root of trust configuration to protect the Key Encryption Key (KEK) chain that secures the sensitive data of the cluster. When a new node, not connected to an HSM, joins a CipherTrust Manager cluster with all HSM connected nodes, it becomes the weakest point in the key hierarchy and potentially weakens the overall security of the cluster.
Managing Clusters and Nodes
To check the status of a cluster
To check the status of a cluster enter the command:
ksctl cluster info
This returns the following response:
{
"nodeID": "",
"status": {
"code": "none",
"description": "not clustered"
},
}
To create a new cluster
Make the following call on a running appliance (you will need to insert the hostname/IP of the appliance). When running cluster new, --host and --url are most likely going to be identical except that --host will not have the protocol. To create a new cluster enter the command:
ksctl cluster new --host=localHostName --url=urlOfCurrentNode --user=username --password=Password_1
This returns the following response:
{
"nodeID": "ab40e178-5f1d-4f03-8b26-7ca378f74988",
"status": {
"code": "r",
"description": "ready"
},
"nodeCount": 1
}
To show the status of all nodes in the cluster
Enter the following command on any node:
ksctl cluster nodes list
This returns the following response:
{
"skip": 0,
"limit": 256,
"total": 1,
"resources": [
{
"nodeID": "c05ead39-94ac-4459-a371-9915e3e16ebf",
"status": {
"code": "r",
"description": "ready"
},
"host": "kylo_pg_1",
"isThisNode": true
}
]
}
The possible statuses are:
nodes are in different states across databases
not clustered
ready
joining: creating (1/5)
joining: bootstrapping (2/5)
joining: initial sync (3/5)
joining: catching up (4/5)
joining: completing (5/5)
killed
down
unknown
To show the status of a single node in the cluster
To show the status of a single node in a cluster enter the command:
ksctl cluster nodes get --id=c05ead39-94ac-4459-a371-9915e3e16ebf
This returns the following response:
{
"nodeID": "c05ead39-94ac-4459-a371-9915e3e16ebf",
"status": {
"code": "r",
"description": "ready"
},
"host": "kylo_pg_1",
"isThisNode": true
}
To show a summary of all nodes in the cluster
To show a summary of all nodes in a cluster, run:
ksctl cluster summary
This returns the following response:
{
"skip": 0,
"limit": 0,
"total": 3,
"resources": {
"2dda1a9e4de4438b9fab1c7efc95bea8": {
"clusterSummary": {
"2dda1a9e4de4438b9fab1c7efc95bea8": {
"clusterErrors": null,
"nodeInfo": {
"host": "54.224.193.194",
"isThisNode": true,
"nodeID": "2dda1a9e4de4438b9fab1c7efc95bea8",
"port": 5432,
"publicAddress": "54.224.193.194",
"status": {
"code": "r",
"description": "ready"
}
},
"summary": "GOOD"
},
"f374bb3eab384692843e8dc60a9c2da2": {
"clusterErrors": null,
"nodeInfo": {
"applyRate": 2500,
"catchupInterval": "00:00:00",
"connectTime": "2022-08-24T05:40:23.706218Z",
"connected": true,
"flushLSN": "0/50D6ED0",
"flushLag": "00:00:00.000873",
"flushLagBytes": 0,
"flushLagSize": "0 bytes",
"host": "54.144.33.167",
"isThisNode": false,
"nodeID": "f374bb3eab384692843e8dc60a9c2da2",
"port": 5432,
"publicAddress": "54.144.33.167",
"replayLSN": "0/50D6ED0",
"replayLag": "00:00:00.000873",
"replayLagBytes": 0,
"replayLagSize": "0 bytes",
"replicationBlocked": false,
"sentLSN": "0/50D6ED0",
"sentLagBytes": 0,
"sentLagSize": "0 bytes",
"state": "streaming",
"status": {
"code": "r",
"description": "ready"
},
"uptime": "00:03:42.332005",
"writeLSN": "0/50D6ED0",
"writeLag": "00:00:00.000873",
"writeLagBytes": 0,
"writeLagSize": "0 bytes"
},
"summary": "GOOD"
},
"f50adf6919c047f0b83c2bf73553f962": {
"clusterErrors": null,
"nodeInfo": {
"applyRate": 2504,
"catchupInterval": "00:00:00",
"connectTime": "2022-08-24T05:28:49.73947Z",
"connected": true,
"flushLSN": "0/50D6ED0",
"flushLag": "00:00:00.001327",
"flushLagBytes": 0,
"flushLagSize": "0 bytes",
"host": "18.234.75.110",
"isThisNode": false,
"nodeID": "f50adf6919c047f0b83c2bf73553f962",
"port": 5432,
"publicAddress": "18.234.75.110",
"replayLSN": "0/50D6ED0",
"replayLag": "00:00:00.001327",
"replayLagBytes": 0,
"replayLagSize": "0 bytes",
"replicationBlocked": false,
"sentLSN": "0/50D6ED0",
"sentLagBytes": 0,
"sentLagSize": "0 bytes",
"state": "streaming",
"status": {
"code": "r",
"description": "ready"
},
"uptime": "00:15:16.298753",
"writeLSN": "0/50D6ED0",
"writeLag": "00:00:00.001327",
"writeLagBytes": 0,
"writeLagSize": "0 bytes"
},
"summary": "GOOD"
}
},
"lastUpdated": "2022-08-24T05:44:06.112975Z"
},
"f50adf6919c047f0b83c2bf73553f962": {
"clusterSummary": {
"2dda1a9e4de4438b9fab1c7efc95bea8": {
"clusterErrors": [
{
"errorMessage": "could not connect to the postgresql server in replication mode: timeout expired\n",
"errorTime": "2022-08-24T06:20:36.959429Z"
}
],
"nodeInfo": {
"applyRate": 945,
"catchupInterval": "00:00:00",
"connected": false,
"flushLSN": "0/3E1A700",
"flushLag": "00:00:00.00079",
"flushLagBytes": 0,
"flushLagSize": "0 bytes",
"host": "54.224.193.194",
"isThisNode": false,
"nodeID": "2dda1a9e4de4438b9fab1c7efc95bea8",
"port": 5432,
"publicAddress": "54.224.193.194",
"replayLSN": "0/3E1A700",
"replayLag": "00:00:00.00079",
"replayLagBytes": 0,
"replayLagSize": "0 bytes",
"replicationBlocked": false,
"sentLSN": "0/3E1A700",
"sentLagBytes": 0,
"sentLagSize": "0 bytes",
"state": "streaming",
"status": {
"code": "d",
"description": "down"
},
"writeLSN": "0/3E1A700",
"writeLag": "00:00:00.00079",
"writeLagBytes": 0,
"writeLagSize": "0 bytes"
},
"summary": "BAD:node is inactive"
},
"f374bb3eab384692843e8dc60a9c2da2": {
"clusterErrors": [
{
"errorMessage": "could not connect to the postgresql server in replication mode: timeout expired\n",
"errorTime": "2022-08-24T06:20:36.973754Z"
}
],
"nodeInfo": {
"applyRate": 936,
"catchupInterval": "00:00:00",
"connected": false,
"flushLSN": "0/3E1A700",
"flushLag": "00:00:00.000856",
"flushLagBytes": 0,
"flushLagSize": "0 bytes",
"host": "54.144.33.167",
"isThisNode": false,
"nodeID": "f374bb3eab384692843e8dc60a9c2da2",
"port": 5432,
"publicAddress": "54.144.33.167",
"replayLSN": "0/3E1A700",
"replayLag": "00:00:00.000856",
"replayLagBytes": 0,
"replayLagSize": "0 bytes",
"replicationBlocked": false,
"sentLSN": "0/3E1A700",
"sentLagBytes": 0,
"sentLagSize": "0 bytes",
"state": "streaming",
"status": {
"code": "d",
"description": "down"
},
"writeLSN": "0/3E1A700",
"writeLag": "00:00:00.000856",
"writeLagBytes": 0,
"writeLagSize": "0 bytes"
},
"summary": "BAD:node is inactive"
},
"f50adf6919c047f0b83c2bf73553f962": {
"clusterErrors": null,
"nodeInfo": {
"host": "18.234.75.110",
"isThisNode": true,
"nodeID": "f50adf6919c047f0b83c2bf73553f962",
"port": 5432,
"publicAddress": "18.234.75.110",
"status": {
"code": "r",
"description": "ready"
}
},
"summary": "GOOD"
}
},
"lastUpdated": "2022-08-24T06:21:05.581794Z"
}
}
}
The response displays:
summary of each node
cluster error information, if any
human readable consolidated summary
last updated information
To join a node to a cluster
When joining a node to a cluster, three steps need to happen:
Get the csr from the joining node by entering the command ksctl cluster csr
Get the certificate and CA chain from a member node by entering the command ksctl cluster nodes create
Run the join command from the joining node by entering the command ksctl cluster join
Caution
If you have recently restored a backup on the CipherTrust Manager, make sure the restore operation is complete and all services are up and running. Then, you can join the CipherTrust Manager to a cluster. If you join a cluster before restore is complete, this can result in other nodes in the cluster failing to reboot.
Caution
The join operation might take some time to complete, depending on network speed and database size. Completion times of 30 minutes are not unusual.
While a join is underway, do not restart the system, as this can cause the join to fail, and the node to disappear from the cluster. There is no need to manually restart the system after join is completed as well.
You can always view the node status to check progress while a join is underway. During the join, the node displays ajoining
status and indicates progress through five stages:creating
,bootstrapping
,initial sync
,catching up
, andcompleting
; thedown
status can also appear for a short time. If there is a failure during joining, the node status iskilled
orunknown
, or a persistentdown
status.Check that the join operation has completed and that the node status is
ready
before joining another node.
While all of these commands can be run individually, it is simpler to use the fulljoin command, which does all of these steps. For the cluster fulljoin command, you are required to input a member's IP or DNS, the joining node's IP or DNS, and either the joining node's configuration file, or its username, password, and URL. To complete the process using the fulljoin command enter the command:
$ ksctl cluster fulljoin --member=<member_IP_or_DNS> --newnodehost=<joining_IP_or_DNS> --newnodeconfig=<config_File_of_Joining_Node>.
As with the individual commands, you must wait until the join operation has completed and the node status is ready
before joining another node.
An example of the cluster fulljoin command using a configuration file:
$ ksctl cluster fulljoin --member=memberIPOrDNS --newnodehost=joiningIPOrDNS --newnodeconfig=configFileOfJoiningNode
Response:
When you add a node to a cluster, the existing data of the node is deleted. Are you sure want to join? [y/N]
Attempting to get the CSR from the joining node...
Finished getting the CSR from the joining node...
Attempting to get the Certificate and CA Chain from the member node...
Finished getting the Certificate and CA Chain from the member node...
Attemping to join the new node to the member's cluster...
{
"nodeID": "",
"status": {
"code": "creating",
"description": "joining: creating (1/5)"
}
}
When a node joins a cluster, the node adopts the credentials of the cluster. Would you like to write the new cluster credentials
to the provided configuration file? [y/N]
y
An example of the cluster fulljoin command without using a configuration file:
$ ksctl cluster fulljoin --member=memberIPOrDNS --newnodehost=joiningIPOrDNS --newnodepass=joiningNodePass --newnodeuser=joiningNodeUser --newnodeurl=joiningNodeURL
Response:
When you add a node to a cluster, the existing data of the node is deleted. Are you sure want to join? [y/N]
y
Attempting to get the CSR from the joining node...
Finished getting the CSR from the joining node...
Attempting to get the Certificate and CA Chain from the member node...
Finished getting the Certificate and CA Chain from the member node...
Attemping to join the new node to the member's cluster...
{
"nodeID": "",
"status": {
"code": "creating",
"description": "joining: creating (1/5)"
}
}
To remove a node from a cluster
To remove a node from a cluster, first remove that node from the cluster then delete the cluster configuration on the removed node. Deleting the cluster configuration allows a node to rejoin the original cluster, join another cluster, or create a new cluster.
Remove the node from the cluster. A node cannot remove itself, so you must call this on some other node in the cluster:
$ ksctl cluster nodes delete --id=ebab0738-6e09-4b0d-8c99-850d7f24dfac
Delete the cluster configuration on the removed node:
$ ksctl cluster delete
Note
If
ksctl cluster delete
doesn't work for any reason it is always possible to perform a full system reset to ensure any left over data is removed from the node. A node can be reset using theksctl services reset
command.
Warning
Refer to section Services Reset for important information on using this command.
To rejoin a node to a cluster
Before rejoining it is recommended to reset the node.
To reset the node enter the command:
$ ksctl services reset
Warning
Refer to section Services Reset for important information on using this command.
Join the node as described in to join a node to a cluster.
To create a new cluster from a removed node
Ensure the node have been successfully removed from the cluster and that ksctl cluster delete
has been performed, as described in to remove a node from a cluster.
Create a new cluster as described in To create a new cluster.
Cluster Upgrade
A cluster can be upgraded in two ways:
Prerequisites
You require
ksadmin
level access with an SSH key.Obtain the signed archive file for the upgrade from the Support Portal. The file has the format
ks_upgrade_<major.minor.patch+build_number>.tar.gz.gpg
.
In-place Cluster Upgrade
A cluster can be upgraded in-place since version 1.9.0. The upgrade is generally limited to one minor version at a time, for example, from 2.7.0 to 2.8.0; from 2.8.0 to 2.9.0; or from 2.9.0 to 2.10.0. Be aware of the following considerations when performing an in-place cluster upgrade.
The node being upgraded will be inaccessible during the upgrade. This may be as long at 10 or more minutes. Clients must be able to handle this outage.
There will be a brief period of time (under 30 seconds) where the database will be locked while upgrading the first node. This affects all nodes at the same time, and some nodes may give error responses during this time.
All nodes in the cluster should be upgraded as soon as possible - nodes running different version of the firmware will behave differently, potentially causing problems with applications.
To perform an in-place cluster upgrade
Before doing any upgrade operation, ensure that you have a backup, and that you have downloaded the backup and associated backup key.
Ensure all nodes in the cluster are up and operating normally. Resolve any issues (like removing any obsolete nodes) before performing the upgrade.
Perform a system upgrade on each node, one at a time.
scp
the archive file to the CipherTrust Manager. You require the private SSH key associated with theksadmin
account.scp -i <path_to_private_SSH_key> <archive_file_name> ksadmin@<ip>:.
SSH into the CipherTrust Manager as ksadmin and ensure there is at least 12GB of space available (not including the upgrade file) before proceeding. Use
df -h/
to view available space.Run the following command to upgrade the device.
$ sudo /opt/keysecure/ks_upgrade.sh -f <archive_file_path>
Here,
<archive_file_path>
specifies the CipherTrust Manager path to the signed archive file.The signature of the archive file is verified and the upgrade is applied.
Reboot the appliance as prompted.
Ensure the upgrade of each node is complete and that the node is operating normally, before proceeding to the next node.
From the
ksadmin
session, runsystemctl status keysecure
to ensure that the CipherTrust Manager services have started. Alternatively, you can visit the CipherTrust Manager web console or attempt to connect with the ksctl CLI.Note
When updating the first node in a cluster, the cluster nodes may briefly experience slower than usual response times. This occurs because the shared database schema for the cluster is updated with the first node.
Upgrade using the cluster remove/rebuild method
On one of the cluster nodes, create and download a backup with corresponding backup key, in case there are any problems.
Remove all nodes from the cluster except one.
Delete the cluster configuration on the remaining node.
$ ksctl cluster delete
Perform the upgrade on the remaining node.
scp
the archive file to the CipherTrust Manager. You require the private SSH key associated with theksadmin
account.scp -i <path_to_private_SSH_key> <archive_file_name> ksadmin@<ip>:.
SSH into the CipherTrust Manager as ksadmin and ensure there is at least 12GB of space available (not including the upgrade file) before proceeding. Use
df -h/
to view available space.Run the following command to upgrade the device.
$ sudo /opt/keysecure/ks_upgrade.sh -f <archive_file_path>
Here,
<archive_file_path>
specifies the CipherTrust Manager path to the signed archive file.The signature of the archive file is verified and the upgrade is applied.
Reboot the appliance as prompted.
After the appliance reboots, re-build the cluster by creating a new cluster on this node.
Perform the upgrade on all other removed nodes.
Note
If a previously used node is to be re-used, the cluster must first be deleted from that system.
Join new instances to the cluster.