HA Troubleshooting

If you encounter problems with an HA group, refer to this section.

Cryptographic Operations Blocked During Remote PED Operations When Audit Logging Is Enabled

With audit logging enabled on the HSM, crypto operations are blocked on all application partitions during Remote PED operations. During this time, requests sent to HA member partitions on this HSM will not fail over to other members. When the Remote PED operation is complete, all crypto operations resume normally. If your application has its own timeout programmed, it may incorrectly conclude that the entire HA group has failed.

Using Luna HSM Client 10.7.2 or newer, you can configure the ProbeTimeout setting in the Chrystoki.conf/crystoki.ini file to trigger an HA failover after a specified time. This allows operations to continue normally during Remote PED operations.

Administration Tasks on HA Groups

Do not attempt to run administrative tasks on an HA group virtual slot (such as altering partition policies). These virtual slots are intended for cryptographic operations only. It is not possible to use an HA group to make administrative changes to all partitions in the group simultaneously; the exception is lunacm:> partition changepw using Luna HSM Client 10.7.0 or neewer.

Unique Object IDs (OUID)

If two applications using the same HA group modify the same object using different members, the object fingerprint might conflict.

Client-Side Limitations

New features or abilities, or new cryptographic mechanisms added by firmware update, or previously usable mechanisms that become restricted for security reasons, can have an impact on the working of an HA group, when the Luna HSM Clientversion is older. Luna Clients are "universal" in the sense that they are able to work fully with current Luna HSMs/partitions, and with earlier versions, as well as with cloud crypto solutions (DPoD Luna Cloud HSM service), but a client version cannot be aware of HSM versions that were not yet developed when the Client was released.

Client-Side Failures

Any failure of the client (such as operating system problems) that does not involve corruption or removal of files, should resolve itself when the client is rebooted.

If the client workstation seems to be working fine otherwise, but you have lost visibility of the HSMs in LunaCM or your client, try the following remedies:

>verify that the Thales drivers are running, and retry

>reboot the client workstation

>restore your client configuration from backup

>re-install Luna HSM Client and re-configure the HA group

Failures Between the HSM Appliance and Client

The only failure that could likely occur between a Luna Network HSM 7 (or multiple HSMs) and a client computer coordinating an HA group is a network failure. In that case, the salient factor is whether the failure occurred near the client or near one (or more) of the Luna Network HSM 7 appliances.

If the failure occurs near the client, and you have not set up port bonding on the client, then the client would lose sight of all HA group members, and the application fails. The application resumes according to its timeouts and error-handling capabilities, and HA resumes automatically if the members reappear within the recovery window that you had set.

If the failure occurs near a Luna Network HSM 7 member of the HA group, then that member disappears from the group until the network failure is cleared, but the client can still see other members, and normal failover occurs.

Avoid direct access to individual HA group members when securing with STC

This is best ensured by having HAonly setting turned ON, in the configuration file, so that only the HA virtual slot is visible and all requests and responses are handled transparently by the HA system (see Configuration File Summary). If you cannot avoid directly accessing an individual HA member slot, then be sure to log out of it before your application attempts to use the HA virtual slot. This is especially important when STC is invoked (see Client-Partition Connections).

Each HSM keeps track of any appid registered against a remote connection, and rejects any attempt to create a new session with different appID from the same client. That is, only one access ID is permitted per STC channel. If a client opens a session directly to an individual HA member partition, then an ID is assigned. If the client next attempts operation via the HA virtual slot, then as part of that process, random appids are assigned to each member partition for the open channel, but one of those member partitions already has the earlier ID, so the HSM responds with CKR_ACCESS_ID_ALREADY_EXISTS and the operation fails.

Log out of any individual member slot, before invoking the HA slot, to avoid this problem.

Some security settings and implications

TIP   Security Note -Cloning policies (0 and 4) permit or deny the ability to securely copy keys and objects into and out of a partition.
The Key Management Functions policy (28) controls the ability to create, delete, generate, derive, or modify cryptographic objects in the current partition.

These controls are independent of each other. With Key Management functions denied, you can still clone objects in and out of partitions where Cloning policy is allowed. Thus HA (high availability) operation can clone keys into a partition that disallows Key Management functions (creation, deletion, etc.). Cloning a key or object into a partition is not considered creation - the key or object already existed within the security / cloning domain that encompasses the partition.

Ultimately the security administrators define where keys can exist by controlling distribution of the security / cloning domain, and by defining policies around those keys.

Additionally, key owners can choose to make their keys non-modifiable and non-extractable, if those options are indicated by your use-case.