You are here: Administration & Maintenance Manual > HSM Administration > Performance

Performance

Luna SA 5.x has a newer generation internal HSM; for discussion purposes, SafeNet refers to this HSM as K6. SafeNet refers to the HSM inside Luna SA 4.x as K5. Both K5 and K6 rely on application-specific integrated circuits to accelerate cryptographic operations within the HSM. Each generation uses different ASICs. These ASICS use multiple “engines” – analogous to the processor inside a computer having multiple central processing units – to spread load and thereby increase performance.

With Luna SA 4.x, a client application needs to create about 20 threads to achieve maximum RSA signing performance based on 1024-bit RSA operations. At this number of threads, within the K5, the ASIC achieves optimal distribution of cryptographic operations across its multiple engines. The ASIC within K6 has a different number of engines and a different algorithm for distributing load. To achieve maximum performance with Luna SA 5.0, a client application needs to create about 50 threads. With refinements made for Luna SA 5.2, this number is now about 30 threads.

 

 

Published performance figures for Luna SA generally reflect repeated single operations against a single object that is imported or looked up one time before all the operations are performed. This is the most advantageous situation, under the best conditions to yield the highest attainable speed with the equipment. All manufacturers take the same approach.

"Real life" performance figures are often lower because of additional overhead, such as where an object must be fetched before each operation, or where the current task switches constantly from one operation type to another (example sign-and-verify in combination).

If you are using (say) the supplied multitoken tool in a lab setting, note that it defaults to a packet size of 1 kilobyte for symmetric encrypt/decrypt operations, a modest size that imposes a significant overhead. To obtain performance closer to "real life" for your situation, the test packet size should be modified to match the sizes that you expect to see in your intended application. For example, a packet size on the order of 256 bits for credit card numbers versus 64 kilobytes and larger for high-throughput encryption could show significantly different performance.

When HA is considered (two or more HSMs in a redundant group), further overhead is introduced in order to replicate/synchronize across all members of the group. Therefore, the type of operation - whether it requires a single initial replication before a large volume of operations against a static object, or whether it requires a new replication before each single operation - can have a very significant impact on performance.

 

HA Performance

For repetitive operations, like a high volume of signings using the same key, an HA group can expand Luna SA performance in linear fashion as HA group members are added. HA groups of 16 members have undergone long-term, full-throttle testing, with excellent results.

Do keep in mind that simply adding more and more Luna SA appliances to an HA group is not an infallible recipe for endless performance improvement. For best overall performance, all HA group members should be driven near their individual performance "sweet spot", which for Luna SA 5.2 and later is around 30 simultaneous threads per HSM. If you assemble an HA group that is considerably larger than your server(s) can drive, then you might not achieve full performance from all.

The best approach is an HA group balanced in size for the capability of the application servers that will be driving the group, and the expected loads - with an additional unit to provide capacity for bursts of traffic and for redundancy.   

 

Performance and the PE1746