Planning Your Deployment

This section describes the supported configurations and any limitations or constraints to consider when setting up an HA group.

HA Group Members

It is important that all members in an HA group have the same configuration and version. That means that each HA group member must use the same authentication method, either PED-authenticated or password-authenticated, and be at the same software version. Running HA groups with different versions is unsupported. Ensure that HSMs are configured identically to ensure smooth high availability and load balancing operation. SafeNet Luna HSMs come with various key management configurations: cloning mode, key-export mode, etc. HA functionality is supported with cloning, provided all members in the group have the same configuration. Clients automatically and transparently use the correct secure key replication method based on the group’s configuration.

It is also critical that all members in an HA group share the same Security Domain role (Red PED key for PED-authenticated devices, or domain password for password-authenticated devices). The Security Domain defines which HSMs are allowed to share key material. Because HA group members are, by definition, intended to be peers, they must be in the same Security Domain.

The SafeNet HA and load-balancing feature works on per-client and per-partition bases. This provides a lot of flexibility. For example, it is possible to define a different sub-set of HSMs in each client and even in each client’s partitions (in the event that a single client uses multiple partitions). SafeNet recommends to avoid these complex configurations and to keep the HA topography uniform for an entire HSM. That is, treat HSM members at the HSM level as atomic and whole. This simplifies the configuration management associated with the HA feature.

Mix and Match Appliance Software is Not Supported

All SafeNet Luna Network HSM appliances in an HA group must be running the same appliance software version. Before attempting to create an HA group, ensure that all of the appliances used to host the HA members are running the same appliance software. In addition, it is recommended that your client software is at the same software version as the appliance.

Mix and Match HSM Firmware, Capabilities, and FIPS Setting is Not Recommended

The HSM firmware, capabilities, and FIPS setting define which mechanisms are available, and how they can be used. To ensure that all objects in an HA slot can be successfully cloned to all members of the HA group, ensure that all members of a production HA group are at the same firmware level, have the same set of capabilities installed, and use the same FIPS setting. If mismatches exist between members, HSM operations or HA synchronization might fail if your application attempts to use a mechanism or a capability that not all members support.

To ensure minimal disruption during the during firmware or capability updates, your HA group will continue to function if there are differences in firmware, capabilities, or FIPS setting between the HA group members. Where differences exist, the capability of the group (in terms of features and available algorithms) is that of the member with the oldest firmware. It is recommended that you limit periods where mismatches are present to maintenance windows used to apply firmware of capability upgrades.

Example

Assume you have an HA group that includes HSMs with two different firmware versions,. In this case, certain capabilities that are part of the newer firmware are unavailable to clients connecting to the HA group. Specifically, operations that make use of newer cryptographic mechanisms and algorithms would likely fail. The client's calls might be initially assigned to a newer-firmware HSM and could therefore appear to work for a time, but if the task is load-balanced to an HSM that does not support the newer features, it would fail. Similarly, if the newer-firmware HSM dropped out of the group, operations requiring the newer firmware would fail.

HA Group Members Must Not Be on the Same Appliance

In any one HA group, always ensure that member partitions or member PKI tokens (USB-attached SafeNet Luna USB HSMs, or SafeNet CA4/PCM token HSMs in a USB-attached SafeNet DOCK2 card reader) are on different / separate appliances. Do not attempt to include more than one HSM partition or PKI token (nor one of each) from the same appliance in a single HA group. This is not a supported configuration. Allowing two partitions from one HSM, or a partition from the HSM and an attached HSM (as for PKI), into a single HA group would defeat the purpose of HA by making the SafeNet appliance a potential single-point-of-failure.

Running HA on a group of export SafeNet Luna Network HSM appliances

This configuration is supported, although you cannot clone/replicate private keys.

High Availability Group Sizing

As of SafeNet Luna HSM release 6.x, the high availability function supports the grouping of up to thirty-two members. However, the maximum practical group size for your application is driven by a trade-off between performance and the cost of replicating key material across the entire group. A common practice is to set the group size to N+1 where N is defined by the desired performance per application server(s). As depicted below, this solution gives the desired performance with a single extra HSM providing the availability requirement. The number of HSMs per group of application servers varies based on the application use case but, as depicted, groups of three are typical.

 

 

As performance needs grow beyond the performance capacity of three HSMs, it often makes sense to define a second independent group of application servers and HSMs to further isolate applications from any single point of failure. This has the added advantage of facilitating the distribution of HSM and application sets in different data centers.

 

Network Requirements

The network topography of the HA group is generally not important to the proper functioning of the group. As long as the client has a network path to each member the HA logic will function. Keep in mind that having a varying range of latencies between the client and each HA member causes a command scheduling bias towards the low-latency members. It also implies that commands scheduled on the long-latency devices have a larger overall latency associated with each command. In this case, the command latency is a characteristic of the network; to achieve uniform load distribution ensure that latencies to each device in the group are similar (with the exception of standby members, who do not contribute to network load). Gigabit Ethernet network connections are recommended.

Upgrading and Redundancy and Rotation

For SafeNet Luna Network HSM HA function we suggest that all SafeNet Luna Network HSM appliances in an HA group be at the same appliance software and firmware level. The issue is not about firmware level, per se - what might happen is that a newer firmware could contain newer algorithms that are not supported in the replaced firmware. If your client is configured to take advantage of newer/better algorithms when they become available, it might do so while one member of an HA group has new firmware, but another member has not yet been updated, and therefore does not yet support the requested algorithm. The client might not be able to interpret the resulting imbalance. Therefore, when you intend to upgrade/update any of the SafeNet Luna Network HSM units in an HA group, or when you intend to upgrade/update the SafeNet Luna Network HSM Client software, you might schedule some downtime for your application, if you anticipate a problem.

If the application is so critical that you cannot permit that much scheduled downtime, then you can set up a second complete set of Client computer and associated HA group. One set can service the application load while the other set is being upgraded or otherwise maintained. For such up-time-critical applications, you might already have such a backup set of Client-plus-HA-group that you would rotate in and out of service during regular maintenance windows.