HA Troubleshooting

If you encounter problems with an HA group, refer to this section.

Administration Tasks on HA Groups

Do not attempt to run administrative tasks on an HA group virtual slot (such as altering partition policies). These virtual slots are intended for cryptographic operations only. It is not possible to use an HA group to make administrative changes to all partitions in the group simultaneously; the exception is partition changepw for HA from HSM Client 10.6.1 onward.

Unique Object IDs (OUID)

If two applications using the same HA group modify the same object using different members, the object fingerprint might conflict.

Client-Side Limitations

New features or abilities, or new cryptographic mechanisms added by firmware update, or previously usable mechanisms that become restricted for security reasons, can have an impact on the working of an HA group, when the HSM Clientversion is older. Luna Clients are "universal" in the sense that they are able to work fully with current Luna HSMs/partitions, and with earlier versions, as well as with cloud crypto solutions (DPoD Luna Cloud HSM service), but a client version cannot be aware of HSM versions that were not yet developed when the Client was released.

Client-Side Failures

Any failure of the client (such as operating system problems) that does not involve corruption or removal of files, should resolve itself when the client is rebooted.

If the client workstation seems to be working fine otherwise, but you have lost visibility of the HSMs in LunaCM or your client, try the following remedies:

>verify that the Thales drivers are running, and retry

>reboot the client workstation

>restore your client configuration from backup

>re-install HSM Client and re-configure the HA group

For Luna PCIe HSM 7, the client is the HSM host. If HA has been working, any sudden failure is likely to be OS or driver related (restart) or file corruption (re-install). If a re-installation is necessary, you must recreate and reconfigure the HA group.

Failures Between the HSM Appliance and Client

The only failure that could likely occur between a Luna Network HSM 7 (or multiple HSMs) and a client computer coordinating an HA group is a network failure. In that case, the salient factor is whether the failure occurred near the client or near one (or more) of the Luna Network HSM 7 appliances.

If the failure occurs near the client, and you have not set up port bonding on the client, then the client would lose sight of all HA group members, and the application fails. The application resumes according to its timeouts and error-handling capabilities, and HA resumes automatically if the members reappear within the recovery window that you had set.

If the failure occurs near a Luna Network HSM 7 member of the HA group, then that member disappears from the group until the network failure is cleared, but the client can still see other members, and normal failover occurs.

Avoid direct access to individual HA group members when securing with STC

This is best ensured by having HAonly setting turned ON, in the configuration file, so that only the HA virtual slot is visible and all requests and responses are handled transparently by the HA system (see Configuration File Summary). If you cannot avoid directly accessing an individual HA member slot, then be sure to log out of it before your application attempts to use the HA virtual slot. This is especially important when STC is invoked (see Client-Partition Connections).

Each HSM keeps track of any appid registered against a remote connection, and rejects any attempt to create a new session with different appID from the same client. That is, only one access ID is permitted per STC channel. If a client opens a session directly to an individual HA member partition, then an ID is assigned. If the client next attempts operation via the HA virtual slot, then as part of that process, random appids are assigned to each member partition for the open channel, but one of those member partitions already has the earlier ID, so the HSM responds with CKR_ACCESS_ID_ALREADY_EXISTS and the operation fails.

Log out of any individual member slot, before invoking the HA slot, to avoid this problem.

TIP   Security Note -Cloning policies (0 and 4) permit or deny the ability to securely copy keys and objects into and out of a partition.
The Key Management Functions policy (28) controls the ability to create, delete, generate, derive, or modify cryptographic objects in the current partition.

These controls are independent of each other. With Key Management functions denied, you can still clone objects in and out of partitions where Cloning policy is allowed. Thus HA (high availability) operation can clone keys into a partition that disallows Key Management functions (creation, deletion, etc.). Cloning a key or object into a partition is not considered creation - the key or object already existed within the security / cloning domain that encompasses the partition.

Ultimately the security administrators define where keys can exist by controlling distribution of the security / cloning domain, and by defining policies around those keys.

Additionally, key owners can choose to make their keys non-modifiable and non-extractable, if those options are indicated by your use-case.