Operation in HA Mode

To operate SafeNet ProtectToolkit-C in HA Mode, the Cryptoki Library keeps track of the commands sent to a session. In case of session failure, SafeNet ProtectToolkit-C will re-establish a new session by replaying these commands.

SafeNet ProtectToolkit-C provides the following functions in HA mode:

>Detects that a session has terminated because of HSM failure and automatically establishes a new session on a functioning HSM

>After an HSM failure is detected, periodically attempts to bring the affected HSM back online

>Restarts an object search at the point of failure

>Restarts an Encrypt, Decrypt, Sign, Verify, SignRecover, VerifyRecover and Digest operation and replays the Update operations (up to a certain data length limit)

>Creates a log entry to note significant events

>Recovers session objects created by:

C_CopyObject

C_DeriveKey

C_UnwrapKey

C_GenerateKey *

C_GenerateKeyPair *

NOTE   Randomly-generated keys cannot be recovered if they are lost after they have been used in a cryptographic operation (otherwise, inconsistent results may be generated).

The environment variable ET_PTKC_GENERAL_LIBRARY_MODE specifies the Cryptoki Library operating mode. This variable controls which PKCS #11 model is applied to slot and token usage (see Work Load Distribution Model and High Availability).

Valid values for this variable are NORMAL or WLD or HA. If this variable is not defined, or contains an invalid value, then SafeNet ProtectToolkit-C will operate in NORMAL PKCS #11 mode.

The environment variable ET_PTKC_HA_RECOVER_DELAY defines the time (in minutes) the system will wait after an HSM failure before attempting reconnection to the failed HSM. If the value is zero, reconnection is not attempted.

The environment variable ET_PTKC_HA_RECOVER_WAIT allows the system to poll and attempt recovery if an HSM has failed. Valid values for this variable are YES or NO, valid only if the HA feature is enabled (ET_PTKC_GENERAL_LIBRARY_MODE=HA).

Example

To configure a basic HA system across two SafeNet ProtectServer Network HSMs with IP addresses 192.168.1.100 and 192.168.1.101, where the participating tokens are labeled "TokName", set these configuration items (see Configuration Items):

ET_HSM_NETWORK_SERVERLIST=192.168.1.100 192.168.1.101
ET_PTKC_WLD_SLOT_0=TokName
ET_PTKC_GENERAL_LIBRARY_MODE=HA
ET_PTKC_HA_RECOVER_DELAY=120
ET_PTKC_HA_RECOVER_WAIT=YES

HA Mode Logging

When the library is operating in HA mode it will generate log messages on certain events.

Configuration Name

Possible Values

ET_PTKC_HA_LOG_FILE

Log filename:

Windows: c:\ptk_halog.log

Linux: /ptk_halog.log

For example,

ET_PTKC_HA_LOG_FILE=C:\temp\ha_log.log (Windows) or

ET_PTKC_HA_LOG_FILE=/tmp/hsm_log.log (Unix)

ET_PTKC_HA_LOG_NAME

Application name – default ptk_cryptoki

The HA feature will generate the following log messages.

Message

Type

Meaning

Session potentially not recoverable: <desc>

Warning

Application has performed an operation that makes the session unrecoverable. The <desc> field will describe the type of operation. Only one message of this type is generated per C_Initialize/C_Finalize session.

HSM Failure detected hsmIdx=<>, hsmSlotId=<>

Error

A session has failed due to an HSM failure and the HA has attempted a session recovery. The hsmIdx is the zero-based index of the failing HSM, as specified by the ET_HSM_NETCLIENT_SERVERLIST or in the order the SafeNet ProtectServer Network HSMs are detected. This is the same order reported by hsmstate utility.

Found HSM Dead:HSM  Failed

Error

This message is generated only when ET_PTKC_HA_RECOVER_DELAY and ET_PTKC_HA_RECOVER_WAIT are enabled.

It indicates that the library has seen an HSM fail and is holding off all application threads while it attempts to recover the lost HSM.