High-Availability Groups
Luna HSMs can provide scalability and redundancy for cryptographic applications that are critical to your organization. For applications that require continuous, uninterruptible uptime, the Luna HSM Client allows you to combine application partitions on multiple HSMs into a single logical group, known as a High-Availability (HA) group.
High Availability Options for Luna HSMs
Luna HSMs support two, non-overlapping approaches to High Availability:
>client-mediated HA creation and management of HA groups, by means of the LunaCM hagroup commands, with preparation, management, and usage described at High-Availability Groups (this is believed to be the majority of HA implementations among Luna HSM customers)
versus
>extensions to the PKCS#11 API to allow interested customers to create and maintain High Availability programmatically.
That is, if you want the redundancy and performance of High Availability,
>you can do what many Luna HSM customers do and use LunaCM commands to declare and manage HA groups and their membership, while relying on the Luna Client library to carry out the day-to-day infrastructure and activity
or
>you can program your own complete HA environment, in full, or tie into (integrate with) suitable COTS High Availability solutions that provide a suitable Application Interface, using HA Indirect Login calls to handle common authentication among HSM partitions in groups.
Client-driven High Availability
This section is about the Client-driven HA feature, traditionally used by most Luna HSM customers.
This feature is best suited to provide redundancy to the Network HSM and PCI-E HSM products. Network HSM partitions can be grouped with other Network HSM partitions or with a Luna Cloud HSM service. PCI-E HSM partitions can be grouped with other PCI-E HSM partitions or with a Luna Cloud HSM Service.
An HA group allows your client application to access cryptographic services as long as one member HSM is functional and network-connected. This allows you to perform maintenance on any individual member without ever pausing your application, and provides redundancy in the case of individual failures. Cryptographic requests are distributed across all active group members, enabling a performance gain for each member added. Cryptographic objects are replicated across the entire group, so HA can also be used to keep a current, automatic, remote backup of the group contents.
HA functionality is handled by the Luna HSM Client software. The individual partitions have no way to know they are configured in an HA group, so you can configure HA on a per-application basis. The way you group your HSMs depends on your circumstances and desired performance.
This chapter contains the following sections:
>Planning Your HA Group Deployment
>Setting an HA Group Member to Standby
>Enabling/Disabling HA Only Mode
>Adding/Removing an HA Group Member
>Manually Recovering a Failed HA Group Member
>General guidelines for updating or converting of HA member partitions
Performance
For repetitive operations (for example, many signings using the same key), an HA group provides linear performance gains as group members are added. The best approach is to maintain an HA group at a size that best balances application server capability and the expected loads, with an additional unit providing capacity for bursts of traffic.
For best overall performance, keep all group members running near their individual performance ideal, about 30 simultaneous threads per HSM. If you assemble an HA group that is significantly larger than your server(s) can manage, you might not achieve full performance from all members. Gigabit Ethernet connections are recommended to maximize performance.
Performance is also affected by the kind of cryptographic operations being requested. For some operations, an HA group can actually hinder performance by requiring extra operations to replicate new key objects. For example, if the operation involves importing and unwrapping keys:
Using an HA group | Using an individual partition |
---|---|
1.Encryption (to wrap the key) 2.Decryption on the primary member partition (to unwrap the key) 3.Object creation on the primary member partition (the unwrapped key is created and stored as a key object) 4.Key replication across the HA group: a.RSA 4096-bit operation is used to derive a shared secret between HSMs b.Encryption of the key on the primary HA member using the shared secret c.Decryption of the key on each HA member using the shared secret d.Object creation on each HA member 5.Encryption (using the unwrapped key object to encrypt the data) |
1.Encryption (to wrap the key) 2.Decryption (to unwrap the key) 3.Object creation (the unwrapped key is created and stored as a key object) 4.Encryption (using the unwrapped key object to encrypt the data) |
In this case, the HA group must perform many more operations than an individual partition, most significantly the RSA-4096-bit operation and creating the additional objects. Those two operations are by far the most time-consuming on the list, and so this task would have much better performance on an individual partition.
The crucial HA performance consideration is whether the objects on the partitions are constant, or always being created and replaced. If tasks make use of already-existing objects, those objects exist on all HA group members; operations can be performed by different group members, boosting performance. If new objects are created, they must be replicated across the entire group, causing a performance loss.
NOTE The way your application uses the C_FindObjects function to search for objects in a virtual HA slot can have a significant impact on your application performance (see Application Object Handles).
Load Balancing
Cryptographic requests sent to the HA group's virtual slot are load-balanced across all active members of the HA group. The load-balancing algorithm sends requests for cryptographic operations to the least busy partition in the HA group. This scheme accounts for operations of variable length, ensuring that queues are balanced even when some partitions are assigned very long operations. When an application requests a repeated set of operations, this method works. When the pattern is interrupted, however, the request type becomes relevant, as follows:
>Single-part (stateless) cryptographic operations are load-balanced.
>Multi-part (stateful) cryptographic operations are load-balanced.
>Multi-part (stateful) information retrieval requests are not load-balanced. In this case, the cost of distributing the requests to different HA group members is generally greater than the benefit. For this reason, multi-part information retrieval requests are all targeted at one member.
> Key management requests are not load-balanced. Operations affecting the state of stored keys (creation, deletion) are performed on a single HA member, and the result is then replicated to the rest of the HA group.
For example, when a member partition is signing and an asymmetric key generation request is issued, additional operations on that member are queued while the partition generates the key. In this case, the algorithm schedules more operations on other partitions in the HA group.
The load-balancing algorithm operates independently in each application process. Multiple processes on the same client or on different clients do not share information when scheduling operations. Some mixed-use cases might cause applications to use some partitions more than others (see Planning Your HA Group Deployment). If you increase key sizes, interleave other cryptographic operations, or if network latency increases, performance may drop for individual active members as they become busier.
NOTE Partitions designated as standby members are not used to perform cryptographic operations, and are therefore not part of the load-balancing scheme (see Standby Members).
The Primary Partition
The primary partition is the first partition you specify as a member of the HA group. While cryptographic operations are load-balanced across all the partitions in the group, new keys are always created on the primary partition, and then replicated on the other partitions (see Key Replication). Depending on how many new keys you are creating on your HA group, this can mean that the primary partition has a heavier workload than the other partitions in the group. If your HSMs are in different remote locations, you could select one with the least latency as the primary partition.
Despite its name, the primary partition is not more critical than any other partition in the HA group. If the primary partition fails, its operations fail over to other partitions in the group, and the next member added to the group becomes the new primary partition.
Network Topography
The network topography of the HA group is generally not important to the functioning of the group. As long as the client has a network path to each member, the HA logic will function. Different latencies between the client and each HA member cause a command scheduling bias towards the low-latency members. Commands scheduled on the long-latency devices have a longer overall latency associated with each command.
In this case, the command latency is a characteristic of the network. To achieve uniform load distribution, ensure that partitions in the group have similar network latency.
Key Replication
When an application creates a key on the virtual HA slot, the HA library automatically replicates the key across all group members before reporting back to the application. Keys are created on one member partition and replicated to the other members. If a member fails during this process, the HA group reattempts key replication to that member until it recovers, or failover attempts time out. Once the key exists on all active members of the HA group, a success code is returned to the application.
Key replication, for pre-firmware-7.7.0 HSM partitions and for V0 partitions, uses the Luna cloning protocol, which provides mutual authentication, confidentiality, and integrity for each object that is copied from one partition to another. Therefore, all HA group member partitions must be initialized with the same cloning domain.
Key replication for V1 partitions uses the Luna cloning protocol to ensure that all HA group members have the same SMK, and uses SKS to export a key originating at one member and to import and decrypt that key (using the common SMK) on each other member in the group.
Again, all HA group member partitions must be initialized with the same cloning domain in order that the common SMK can be available on every member.
The cloning protocol (for pre-firmware-7.7.0 or V0 or mixed-version HA), or the SKS protocol (for for blob transfers in a V1 partition HA) is invoked separately for each object to be replicated and the sequence of required calls must be issued by an authorized client library residing on a client platform that has been authenticated to each of the partitions in the HA group). That is, the client must have an authenticated NTLS or STC channel to the HSM appliance; only an authorized client can perform object synchronization across all HA members.
NOTE Objects (session or token) are replicated immediately to all members in an HA group when they are generated in the virtual HA slot. Similarly, deletion of objects (session or token) from the virtual HA slot is immediately replicated across all group members.
If your application bypasses the virtual slot and creates or deletes directly in a physical member slot, the action occurs only in that single physical slot, and can be overturned by the next synchronization operation. For this reason we generally advise to enable HA-only, unless you have specific reason to access individual physical slots, and are prepared (in your application) to perform the necessary housekeeping.
Failover
When any active HA group member fails, a failover event occurs – the affected partition is dropped from the list of available HA group members, and all operations that were pending on the failed partition are transparently rescheduled on the remaining member partitions. The Luna HSM Client continuously monitors the health of member partitions at two levels:
> network connectivity – disruption of the network connection causes a failover event after a 20-second timeout.
>command completion – any command that is not executed within 20 seconds causes a failover event.
NOTE Most commands are completed within milliseconds. Some can take longer, either because the command itself is time-consuming (for example, key generation), or because the HSM is under extreme load. The HSM automatically sends a "heartbeat" signal every two seconds for commands that are pending or in progress. The client extends the 20-second timeout whenever it receives a heartbeat, preventing false failover events.
When an HA group member fails, the HA group status (see hagroup listgroups) reports a device error for the failed member. The client tries to reconnect the failed member at a minimum retry rate of once every 60 seconds, for the specified number of times (see Recovery).
When a failover occurs, the application experiences a latency stall on the commands in process on the failing unit, but otherwise there is no impact on the transaction flow. The scheduling algorithm described in Load Balancing automatically minimizes the number of commands that stall on a failing unit during the 20-second timeout.
As long as one HA group member remains functional, cryptographic service is maintained no matter how many other group members fail. As described in Recovery, members can be returned to service without restarting the application.
Mid-operation failures
Any operation that fails mid-point needs to be re-sent from the calling application. The entire operation returns a failure (CKR_DEVICE_ERROR). This is more likely to happen in a multi-part operation, but a failure could conceivably happen during a single atomic operation as well.
For example, multi-part operations could be block encryption/decryption or any other command where the previous state of the HSM is critical to the processing of the next command. These operations must be re-sent, since the HA group does not synchronize partitions' internal memory state, only the stored key material.
NOTE You must ensure that your applications can deal with the rare possibility of a mid-operation failure, by re-issuing the affected commands.
Possible Causes of Failure
In most cases, a failure is a brief service interruption, like a system reboot. These temporary interruptions are easily dealt with by the failover and auto-recovery functions. In some cases, additional actions may be required before auto-recovery can take place.
>HSM-side failures
•HSM card failure
•HSM re-initialization
•HSM reboot
•HSM power failure
•Deactivated partition
•NTLS service failure
•STC service failure
>Client-side failures
•Client workstation power failure
•Client workstation reboot
Recovery
Recovery of a failed HA group member is designed to be automatic in as many cases as possible. You can configure your auto-recovery settings to require as much manual intervention as is convenient for you and your organization. In either an automated or manual recovery process, there is no need to restart your application. As part of the recovery process:
>Any cryptographic objects created while the member was offline are automatically replicated to the recovered partition.
>The recovered partition becomes available for its share of load-balanced cryptographic operations.
Auto-recovery
When auto-recovery is enabled, Luna HSM Client performs periodic recovery attempts when it detects a member failure. You can adjust the frequency (maximum once per minute) and the total number of retries (no limit). If the failed partition is not recovered within the scheduled number of retries, it remains a member of the HA group, but the client will no longer attempt to recover it. You must then address whatever equipment or network issue caused the failure, and execute a manual recovery of the member partition.
With each recovery attempt, a single application thread experiences a slight latency delay of a few hundred milliseconds while the client uses the thread to recover the failed member partition.
There are two HA auto-recovery modes:
>activeBasic – uses a separate, non-session-based Active Recovery Thread to perform background checks of HA member availability, recover failed members, and synchronize the contents of recovered members with the rest of the group. It does not restore existing sessions if all members fail simultaneously and are recovered.
>activeEnhanced – works the same as activeBasic, but restores existing sessions and login states if all members fail and are recovered.
HA auto-recovery is disabled by default. It is automatically enabled when you set the recovery retry count (see Configuring HA Auto-Recovery). Thales recommends enabling auto-recovery in all configurations.
NOTE If a member partition loses Activation when it fails (it remains offline for more than two hours) you must present the black Crypto Officer PED key to re-cache the PED secret before the member can be recovered.
Manual Recovery
When auto-recovery is disabled, or fails to recover the partition within the scheduled number of retries, you must execute a manual recovery in LunaCM. Even if you use manual recovery, you do not need to restart your application. When you execute the recovery command, the client makes a recovery attempt the next time the application uses the group member (see Manually Recovering a Failed HA Group Member).
Even with auto-recovery enabled and configured for a large number of retries, there are some rare occasions where a manual recovery may be necessary (for example, when a member partition and the client application fail at the same time).
CAUTION! Never attempt a manual recovery while the application is running and auto-recovery is enabled. This can cause multiple concurrent recovery processes, resulting in errors and possible key corruption.
Failure of All Group Members
If all members of an HA group fail (and no standby members are configured), all logged-in sessions are lost, and operations that were active when the last member failed are terminated. If you have set the HA auto-recovery mode to activeEnhanced, all sessions will be restarted when one or more members are recovered, and normal operations will resume. Otherwise, you must restart the client application once the group members have been recovered.
Permanent Failures
Sometimes an HSM failure is permanent (from the perspective of the HA group). For example, if the HSM is re-initialized, the member partition is erased and must be recreated. In this case, you can decide to recreate the original member or deploy a new member to the group. The client automatically replicates cryptographic objects to the new member and begins assigning operations to it (see Replacing an HA Group Member).
Standby Members
After you add member partitions to an HA group, you can designate some as standby members. Cryptographic objects are replicated on all members of the HA group, including standby members, but standby members do not perform any cryptographic operations unless all the active members go offline. In this event, all standby members are immediately promoted to active service, and operations are load-balanced across them. This provides an extra layer of assurance against a service blackout for your application.
Since standby members replicate keys but do not perform operations, they can also serve as an automatic backup partition for the cryptographic objects on the HA group. The contents of standby partitions are always kept up-to-date, so it is not possible to keep multiple backups (different generations of preserved material) using an HA group (see Planning Your HA Group Deployment). You can consider HA standby members to be your backup only in the case where the most recent sync always replicates all objects you are interested in preserving and recovering.
If you have audit-compliance rules or other mandate to preserve earlier partition contents (keys and objects), then you should perform intentional backups with dedicated backup devices (see Backup and Restore Using a Luna Backup HSM (G5)
Process Interaction
At the lowest communication level, the transport protocol (TCP) maintains communication between the client and the appliance (whether HA is involved or not). For HA groups involving member partitions on Luna PCIe HSM, the protocol timeout is 10 seconds. This means:
>In a period of no activity by client or appliance, the appliance's TCP will wonder if the client is still there, and send a packet after 10 seconds of silence.
>If that packet is acknowledged, the 10-second TCP timer restarts, and the cycle repeats indefinitely.
>If the packet is not acknowledged, TCP sends another every 10 seconds. If there is no response after 2 minutes, the connection is considered dead, and higher levels are alerted to perform their cleanup.
Above that level, the NTLS/STC layer provides the connection security and some other services. Any time a client sends a request for a cryptographic operation, the HSM on the appliance begins working on that operation.
While the HSM processes the request, appliance-side NTLS/STC sends a "keep-alive" ping every 2 seconds, until the HSM completes the request. NTLS/STC does not perform any interpretation of the ping, but simply keeps the TCP layer active. If your client application requests a lengthy operation (for example, an 8192-bit keygen), the random-number-generation portion of that operation could take minutes, during which the HSM would legitimately be sending nothing back to the client. The NTLS ping ensures that the connection remains alive during long pauses.
Application Object Handles
Application developers should be aware that the PKCS #11 object handle model is fully virtualized when using an HA slot. The application must not assume fixed handle numbers across instances of an application. A handle’s value remains consistent for the life of a process; but it might be a different value the next time the application is executed.
When you use an HA slot with your applications, the client behaves as follows when interacting with the application:
1.Intercept the call from the application.
2.Translate virtual object handles to physical object handles using the mappings specified by the virtual object table. The virtual object table is created and updated for the current session only, and only contains of list of the objects accessed in the current session.
3.Launch any required actions on the appropriate HSM or partition.
4.Receive the result from the HSM or partition and forward the result to your application,
5.Propagate any changes in objects on the physical HSM that performed the action to all of the other members of the HA group.
Virtual slots and virtual objects
When an application uses a non-HA physical slot, it addresses all objects in the slot by their physical object handles. When an application uses an HA slot, however, a virtual layer of abstraction overlays the underlying physical slots that make up the HA group, and the HA group is presented to the application as a virtual slot. This virtual slot contains virtual objects that have virtual object handles. The object handles in an HA slot are virtualized since the object handles on each of the underlying physical slots might be different from slot to slot. Furthermore, the physical object handles could change if a member of the HA group drops out (fails or loses communication) and is replaced.
The virtual object table
HA slots use a virtual object table to map the virtual objects in the virtual HA slot to the real objects in the physical slots that make up the HA group. The HA client builds a virtual object table for each application that loads the library. The table is ephemeral, and only exists for the current session. It is created and updated, if necessary, each time an application makes a request to access an object. To maximize performance and efficiency, the table only contains a list of the objects accessed in the current session. For example, the first time an application accesses an object after application start up, the table is created, a look up is performed to map the virtual object to its underlying physical objects, and an entry for the object is added to the table. For each subsequent request for that object, the data in the table is used and no look up is required. If the application then accesses a different object that is not listed in the table, a new look up is performed and the table is updated to add an entry for the new object.
C_FindObjects behavior and application performance
Since the client must perform a lookup to create the virtual object table, the way you use the C_FindObjects function can have a significant impact on the performance of your applications. For example, if you use the C_FindObjects function to ask for specific attributes, the client only needs to update the table to include the requested objects. If, however, you use the C_FindObjects function to find all objects, the client queries each HSM/partition in the group, for each object, to create the table. This can take a significant amount of time if the slot contains a large number of objects, or if the HA group includes many members.
To mitigate performance degradation when using the C_FindObjects function to list the objects on an HA slot, we recommend that you structure your applications to search by description, handles, or other attributes, rather than searching for all objects. Doing so minimizes the number of objects returned and the time required to create or update the table. If your application must find all objects, we recommend that you add the C_FindObjects all function call to the beginning of your application so that the table is built on application start up, so that the table is available to the application for all subsequent C_FindObjects function calls.
Example: Database Encryption
This section walks through a sample use case of some of the HA logic with a specific application – a transparent database encryption.
Typical Database Encryption Key Architecture
Database engines typically use a two-layered key architecture. At the top layer is a master encryption key that is the root of data protection. Losing this key is equivalent to losing the database, so it obviously needs to be highly durable. At the second layer are table keys used to protect table-spaces and/or columns. These table keys are stored with the database as blobs encrypted by the master encryption key (MEK). This architecture maps to the following operations on the HSM:
1.Initial generation of master key for each database.
2.Generation and encryption of table keys with the master key.
3.Decryption of table keys when the database needs to access encrypted elements.
4.Generation of new master keys during a re-key and then re-encrypting all table keys with it.
5.Generation and encryption of new table keys for storage in the database (often done in a software module).
The HSM is not involved in the use of table keys. Instead it provides the strong protection of the MEK which is used to protect the table keys. Users must follow backup procedures to ensure their MEK is as durable as the database itself (Backup and Restore Using a Luna Backup HSM (G5) ).
HSM High Availability with Database Encryption
When the HSMs are configured as an HA group, the database’s master key is automatically and transparently replicated to all the members when the key is created or re-keyed. If an HSM group member was offline or fails during the replication, it does not immediately receive a copy of the key. Instead the HA group proceeds after replicating to all of the active members. Once a member is re-joined to the group the HSM client automatically replicates the new master keys to the recovered member.
Before every re-key event, the user must ensure the HA group has sufficient redundancy. A re-key will succeed as long as one HA group member exists, but proceeding with too few HSMs will result in an availability risk. For example, proceeding with only one HSM means the new master key will be at risk since it exists only on a single HSM. Even with sufficient redundancy, Thales Group recommends maintaining an offline backup of a database’s master key.
HSM Load Balancing with Database Encryption
While a database is up and running, the master key exists on all members in the HA group. Requests to encrypt or decrypt table keys are distributed across the entire group. The load-balancing feature is able to deliver improved performance and scalability when the database requires a large number of accesses to the table keys. Most deployments will not need much load balancing as the typical database deployment results in a small number of table keys.
While the table keys are re-keyed, new keys are generated in the HSM and encrypted for storage in the database. Within an HA group, these keys are generated on the primary member and then replicated to the entire HA group, even though they exist on the HSM for only a moment. These events are infrequent enough that this extra replication has minimal impact.