Work Load Distribution and High Availability
There is no restriction on the number of ProtectServer 3 HSMs working together in a system. High scalability, availability, reliability and increased throughput are the result. The built-in configurable Work Load Distribution (WLD) mode can relieve the application of its own load sharing processing, allowing it to focus on its primary tasks. A High Availability (HA)/load balancing setup reliably boosts overall performance.
Work Load Distribution
In a load distribution design approach, work is balanced across a system by transferring units of work between processing modules. The demand placed on any particular module is thereby reduced. A well-balanced system results in an increase in the overall throughput of processing tasks.
There are a number of integral components within a system which deploys load distribution. In a SafeNet system, the load distribution scheme is called WLD. Within ProtectToolkit-C, a distribution engine portions work requests and distributes them among HSMs according to a distribution scheme. The tokens used within the scheme must be replicated across the HSMs, according to the system design. A good system design should address throughput requirements, resource portioning and fault tolerance/disaster recovery. The ctident utility establishes trust between HSMs that share tokens (see ProtectServer owner and identity certificates). The ctkmu utility replicates a token once trust has been established.
High Availability
Enterprises must maintain their services and keep them reliably up and running. By providing redundancy and availability in services, HA is critical to security. The HA feature keeps track of the commands sent to a session. In case of session failure, ProtectToolkit-C will re-establish a new session by replaying these commands. This is the best approach to achieve transparent fail-over.
ProtectToolkit-C configuration
To enable WLD or HA, ProtectToolkit-C must be configured to operate in WLD or HA mode. Refer to Operation in WLD Mode and Operation in HA Mode for details on each.
Note
WLD and HA mode are enabled and disabled as separate modes for the ProtectServer 3 HSM, but HA mode can be considered WLD with recovery logic added to provide redundancy and reliability; that is, HA depends on WLD to manage failed HSMs and allocate new sessions to them.
When applications use the ProtectToolkit-C interface in WLD/HA mode, the system of physical HSMs appears as a single virtual HSM. ProtectToolkit-C uses virtual WLD slots to achieve this. To use a WLD slot, applications use the standard PKCS#11 function calls. The distribution engine distributes the session over the physical HSM slots associated with the WLD slot (see WLD system setup).
WLD slots
A WLD slot is a virtual PKCS#11 slot. Associated with this slot may be several (but at least one) ‘real’ HSM slots, possibly located across multiple devices. Each WLD slot must be configured by the user (see Configuring WLD slots). For a physical HSM slot to be associated with a WLD slot, it must share the same token label as the WLD slot. Each WLD slot token label must be unique. The distribution engine uses the token label for determining the underlying physical HSM slots on which to share workload.
Note
The HA system cannot support more than 16 slots. The Administrator must limit the WLD slot numbers to be 16 or fewer (from 00 to 15 inclusive).
In WLD/HA mode, token and session objects on a WLD slot are only visible to the session that generated the objects.
Distribution scheme
The distribution of application requests is performed on a per-session basis. When an application opens a session to a WLD slot, the distribution engine selects the initial physical HSM slot to service the open session request, according to the distribution scheme. Once the session has been opened, all other requests performed on that session are routed to the initial physical HSM slot. When an application opens subsequent sessions, the distribution engine randomly selects a physical HSM slot from those with the least number of sessions.
As multiple applications may be using the distribution engine, the scheme ensures that slots are not ‘victimized’ because of their position in the scheme. For example, if multiple applications are started one at a time, and each application requests a single session on the same WLD slot, randomization will ensure an even distribution of sessions across the available physical HSM slots.
Token replication
ProtectToolkit-C supports replication of token information in a protected form to other SafeNet HSMs. The ctident utility is used to establish trust between HSMs that share tokens (see ProtectServer owner and identity certificates). The ctkmu utility is used to replicate a token once trust has been established. See Trust management and Token replication for more information.
Token replication must be performed by the user at configuration time. The WLD model works on a static configuration.
Caution
The tokens in WLD must always be consistent. The distribution engine does not check or ensure that the physical HSM tokens associated with a particular WLD token are consistent. If the state of the tokens is inconsistent or incorrect, inappropriate keys could be used. This could occur without notice and without incident.