Home > |
---|
For repetitive operations, like a high volume of signings using the same key, an HA group can expand SafeNet Network HSM performance in linear fashion as HA group members are added. HA groups of 16 members have undergone long-term, full-throttle testing, with excellent results.
Do keep in mind that simply adding more and more SafeNet Network HSM appliances to an HA group is not an infallible recipe for endless performance improvement. For best overall performance, all HA group members should be driven near their individual performance "sweet spot", which for SafeNet Network HSM 5.2 and later is around 30 simultaneous threads per HSM. If you assemble an HA group that is considerably larger than your server(s) can drive, then you might not achieve full performance from all.
The best approach is an HA group balanced in size for the capability of the application servers that will be driving the group, and the expected loads - with an additional unit to provide capacity for bursts of traffic and for redundancy.
SafeNet Network HSM 6.x in HA can provide performance improvement for asymmetric single-part operations. Gigabit ethernet connections are recommended to maximize performance. For example, we have seen as much as a doubling of asymmetric single-part operations in a two-member group in a controlled laboratory environment (without crossing subnet boundaries, without competing traffic or other latency-inducing factors).
Multi-part operations are not load-balanced by the SafeNet HA due to the overhead that would be needed to perform context replication for each part of a multi-part operation.
Single-part cryptographic operations are load-balanced by the SafeNet HA functionality under most circumstances (see note on PE1746crypto integrated circuit within the K6 HSM (the stand-alone Luna PCI-E, and the HSM inside the Luna SA appliance.Enabled setting). Load-balancing these operations provides both scalability (better net throughput of operations) and redundancy by supporting transparent fail-over.
Performance is also affected by the kind of operation you are performing. HA is better for performance when all HSM operations are performed on keys and material that reside within the HSM. This changes if part of the operation involves importing and unwrapping of keys; it can be instructive to consider what happens when such HSM operations are performed both with and without HA.
• One encryption (to wrap the key)
•One decryption in the HSM (to unwrap the key)
•Object Creation on the HSM (the unwrapped key is created and stored as a key object)
•Key Replication happens for HA
–RSA 4096-bit operation used to derive a shared secret between HSM
–Encryption of the key on the primary HA member using the shared secret
–Decryption of the key on the secondary HA member hsm using the shared secret
–Object Creation on the second HA member
•One encryption (uses the unwrapped key object to encrypt the data)
•One encryption (to wrap the key)
•One decryption in the HSM (to unwrap the key)
•Object Creation on the HSM (the unwrapped key is created and stored as a key object)
•One encryption (uses the unwrapped key object to encrypt the data)
From the above it is apparent that, with HA, many more operations are performed. Most significant in the above case are the RSA 4096-bit operation and the additional Object Creation performed. Those two operations are by far the slowest operations in the list, and so this type of task would have much better performance without HA.
By contrast, if the task had made use of objects already within the HSM, then at most a single synchronization would have propagated the objects to all HA members, and all subsequent operations would have seen a performance boost from HA operation. The crucial consideration is whether the objects being manipulated are constant or are constantly being replaced.
Performance implications of HA in general, and of C_FindObjects in particular, are discussed in detail at Using HA With Your Applications.
Briefly, C_FindObjects can be called with an option to search for ALL objects, or to search more specifically for a subset of all objects in the HA group. The search for ALL objects can be lengthy and initially slows performance of the HA group.
If your application is the type that launches and then remains running while performing ongoing crypto operations, then a call to C_FindObjects ALL at the beginning is just a momentary performance hit and then your application benefits from maximum HA performance thereafter.
However, if your application is programmed to launch, run a small number of crypto operations and then close, the use of C_FindObjects ALL can impose a significant performance penalty. For that kind of application, you should use C_FindObjects with very specific search parameters for the fastest possible creation of the minimum Virtual Objects Table necessary for your application.
Note: The cached object list is ephemeral, and only exists for the current session. If you restart the application, HA must recreate the object list cache. Best practice is to execute C_FindObjects to create the cached object list at application start up.
Note: Beginning with release 6.2.1, the initial call to C_FindObjects ALL is optimized, but we still recommend that you avoid running "C_FindObjects" with the "all" option if you can avoid it by using a more limited search.
The SafeNet HSM client accepts a configuration file entry known as “PE1746Enabled”. This configures the way SafeNet HSM handles symmetric encryption and decryption operations for certain algorithms – namely ECB and CBC modes of AES and TDES. By default (beginning with release 5.4) an entry is always present in the [Misc] section of the configuration file, and its value is set to “PE1746Enabled=0”, or unset.
To set this configuration option, “PE1746Enabled=1”.
When set, this value configures the library to use fast-path cryptography directly to symmetric encryption engines. This has the advantage of enabling high performance bulk crypto performance, but has the disadvantage of creating a direct context between the client library and the engine. This means that the library cannot easily load-balance operations across HSMs. This mode should be used only by applications that perform large data encryption operations (>1K data sizes).
When PE1746Enabled=0, the library uses its standard command path to the HSM. The advantage of this is that all single-part cryptographic operations can be load-balanced. The disadvantage is lower performance for larger data sizes. Applications should maintain this setting whenever possible to ensure the scalability and fail-over advantages.
In summary:
•when PE1746Enabled=1 load-balancing is not used for symmetric cryptographic operations; instead all symmetric operations are directed at the client’s primary member -- you see better performance, but no scalability across HSMs.
•when PE1746Enabled=0 all single-part cryptographic operations (with data size less-than-or-equal-to 1K ) are load-balanced.
A single-part crypto operation is typically one that has small data sizes (< 1Kb), but is also dependent on how the library makes its API calls (PKCS #11 supports explicit multi-part API calls through the use of C_EncryptUpdate and C_DecryptUpdate). When an application uses the “Update” APIs the cryptographic operation is, by definition, multi-part. When the application does not use these APIs (i.e. uses C_EncryptInit followed by C_Encrypt) then an operation is single-part up to a 64KB data size.
Additionally, the HSM has a limit of 1000 contexts for SafeXcel 1746 operations, which is a consideration when many client threads are involved, and depends upon the number of concurrent threads. (LHSM-12630)
Whenever possible, run your application with PE1746Enabled=0.