Configuring a High-Availability Group

Use LunaCM to create an HA group from partitions assigned to your client. This procedure is completed by the Crypto Officer. Ensure that you have met all necessary prerequisites before proceeding with group creation. For a detailed description of HA functionality, see High-Availability Groups.

NOTE   Your LunaCM instance needs to update the Chrystoki.conf (Linux/UNIX) or crystoki.ini file (Windows) when setting up or reconfiguring HA. Ensure that you have Administrator privileges on the client workstation.

V1 partitions: If you add an application partition with an existing SMK to an HA group, the primary member's SMK overwrites the existing SMK of the joining partition.If a partition's SMK has ever been used to encrypt important SKS objects, save a backup of the SMK before adding that partition to any HA group.

The following procedures are included in the configuration process:

>Verifying an HA Group

>Setting an HA Group Member to Standby

>Configuring HA Auto-Recovery

>Enabling/Disabling HA Only Mode

>HA Logging

Prerequisites

HA groups are set up in LunaCM by the Crypto Officer. Before the CO can perform this setup, however, all HSMs and member partitions must meet the following prerequisites, completed by the HSM and Partition Security Officers.

HSMs

The HSM SO must ensure that all HSMs containing HA group member partitions meet the following prerequisites:

>All HSMs must use the same authentication method (password/multifactor quorum). Luna Cloud HSM services support password authentication only.

>All must be running one of the supported software/firmware versions. Generally, Thales recommends using HSMs with the same software/firmware for HA. However, mixed-version HA groups containingLuna USB HSM 7 member partitions and Luna Cloud HSM services are supported. See Cloning Keys Between Luna 6, Luna 7, and Luna Cloud HSM for more information.

>HSM policies 7: Allow Cloning and 16: Allow Network Replication must be set to 1 (see Setting HSM Policies Manually).

>HSM policies must be consistent across all HSMs, particularly 12: Allow non-FIPS algorithms. Do not attempt to use an HA group combining HSMs with FIPS mode on and others with FIPS mode off.

Partitions

The Partition SO must ensure that all partitions in an HA group meet the following prerequisites:

>All partitions must be visible in LunaCM on the client workstation.

>All partitions must be initialized with the same cloning domain:

password-authenticated partitions must share the same domain string.

multifactor quorum-authenticated partitions must share the same red domain iKey.

>Partition policies 0: Allow private key cloning and 4: Allow secret key cloning must be set to 1 on all partitions.

>Partition policies must be consistent across all member partitions.

>The Crypto Officer role on each partition must be initialized with the same CO credential (password or black iKey).

>Multifactor Quorum-authenticated partitions must have partition policy22: Allow activation set to 1. Each partition must have the same activation challenge secret set (see Activation on Multifactor Quorum-Authenticated Partitions)

NOTE   If HSM policy 21: Force user PIN change after set/reset is set to 1 (the default setting), the Crypto Officer must change the initial CO credential before using the partition for cryptographic operations. This applies to the activation challenge secret as well (see role changepw).

To set up an HA group

1.Decide which partition will serve as the primary member (see The Primary Partition). Create a new HA group, specifying the following information:

the group label (do not call the group "HA")

the Serial number OR the slot number of the primary member partition

the Crypto Officer password or challenge secret for the partition

lunacm:>hagroup creategroup -label <label> {-slot <slotnum> | -serialnumber <serialnum>}

LunaCM generates a serial number for the HA group (by adding a "1" before the primary partition serial number), assigns it a virtual slot number, and automatically restarts.

2.Add another partition to the HA group, specifying either the slot or the serial number. If the new member contains cryptographic objects, you are prompted to decide whether to replicate the objects within the HA group, or delete them. See also Adding/Removing an HA Group Member.

lunacm:>hagroup addmember -group <grouplabel> {-slot <slotnum> | -serialnumber <serialnum>}

Repeat this step for each additional HA group member.

NOTE   By default, lunacm:>hagroup addmember automatically adds a Luna Cloud HSM service as a standby HA member. If you prefer to use the Luna Cloud HSM service as an active HA member, you must first edit the following toggle in the Chrystoki.conf/crystoki.ini configuration file (see Configuration File Summary):

[Toggles]
lunacm_cv_ha_ui = 0

3.If you are adding member partitions that already have cryptographic objects stored on them, initiate a manual synchronization. You can tell whether this step is required by checking the line Needs Sync : yes/no in the HA group output. This will also confirm that the HA group is functioning correctly.

lunacm:> hagroup synchronize -group <grouplabel>

4.[Optional] If you created an HA group out of empty partitions, and you want to verify that the group is functioning correctly, see Verifying an HA Group.

5.Specify which member partitions, if any, will serve as standby members.

See Setting an HA Group Member to Standby.

6.Set up and configure auto-recovery (recommended). If you choose to use manual recovery, you will have to execute a recovery command whenever a group member fails.

See Configuring HA Auto-Recovery.

7.[Optional] Enable HA Only mode (recommended).

See Enabling/Disabling HA Only Mode.

8.[Optional] Configure HA logging.

See HA Logging for procedures and information on reading HA logs.

The HA group is now ready for your application.

Verifying an HA Group

After creating an HA group in LunaCM, you can see the group represented as a virtual slot alongside the physical slots. The following procedure is one way to verify that your HA group is working as intended.

To verify an HA group

1.Exit LunaCM and run multitoken against the HA group slot number (slot 5 in the example) to create some objects on the HA group partitions.

./multitoken -mode <keygen_mode> -key <key_size> -nodestroy -slots <HA_virtual_slot>

You can hit Enter at any time to stop the process before the partitions fill up completely. Any number of created objects will be sufficient to show that the HA group is functioning.

2.Run LunaCM and check the partition information on the two physical slots. Check the object count under "Partition Storage":

lunacm:> partition showinfo

        Current Slot Id: 0

lunacm:> partition showinfo

        Partition Storage:
                Total Storage Space:  325896
                Used Storage Space:   22120
                Free Storage Space:   303776
                Object Count:         14
                Overhead:             9648

Command Result : No Error


lunacm:> slot set slot 1

        Current Slot Id:    1     (Luna User Slot 7.7.2 (PW) Signing With Cloning Mode)

Command Result : No Error


lunacm:> partition showinfo

        Partition Storage:
                Total Storage Space:  325896
                Used Storage Space:   22120
                Free Storage Space:   303776
                Object Count:         14
                Overhead:             9648

Command Result : No Error

3.To remove the test objects, login to the HA virtual slot and clear the virtual partition.

lunacm:> slot set -slot <HA_virtual_slot>

lunacm:> partition login

lunacm:> partition clear

If you are satisfied that your HA group is working, you can begin using your application against the HA virtual slot. The virtual slot assignment will change depending on how many more application partitions are added to your client configuration. If your application invokes the HA group label, this will not matter. If you have applications that invoke the slot number, see Enabling/Disabling HA Only Mode.

Setting an HA Group Member to Standby

Some HA group members can be designated as standby members. Standby members do not perform any cryptographic operations unless all active members have failed (see Standby Members for details). They are useful as a last resort against loss of application service.

Prerequisites

>The partition you want to designate as a standby member must already be a member of the HA group (see Adding/Removing an HA Group Member).

>The group member must be online.

>The Crypto Officer must perform this procedure.

To set an HA group member to standby

1.[Optional] Check the serial number of the member you wish to set to standby mode.

lunacm:> hagroup listgroups

2.Set the desired member to standby mode by specifying the serial number.

lunacm:> hagroup addstandby -group <label> -serialnumber <member_serialnum>

To make a standby HA member active

NOTE   By default, a Luna Cloud HSM service is always added to an HA group as a standby member. If you prefer to use the Luna Cloud HSM service as an active HA member, you must first edit the following toggle in the Chrystoki.conf/crystoki.ini configuration file (see Configuration File Summary):

[Toggles]
lunacm_cv_ha_ui = 0

1.[Optional] Check the serial number of the standby member.

lunacm:> hagroup listgroups

2.Remove the member from standby and return it to active HA use.

lunacm:> hagroup removestandby -group <label> -serialnumber <member_serialnum>

Configuring HA Auto-Recovery

When auto-recovery is enabled, Luna HSM Client performs periodic recovery attempts when it detects a member failure. HA auto-recovery is disabled by default for new HA groups. To enable it, you must set a maximum number of recovery attempts. You can also set the frequency of recovery attempts, and the auto-recovery mode (activeBasic or activeEnhanced). These settings will apply to all HA groups configured on the client.

To configure HA auto-recovery

1.Set the desired number of recovery attempts by specifying the retry count as follows:

Set a value of 0 to disable HA auto-recovery

Set a value of -1 for unlimited retries

Set any specific number of retries from 1 to 500

lunacm:> hagroup retry -count <retries>

2.[Optional] Set the desired frequency of recovery attempts by specifying the time in seconds. The acceptable range is 60-1200 seconds (default: 60).

lunacm:> hagroup interval -interval <seconds>

3.[Optional] Set the auto-recovery mode. The default is activeBasic.

lunacm:> hagroup recoverymode -mode {activeBasic | activeEnhanced}

4.[Optional] Check that auto-recovery has been enabled. You are prompted for the Crypto Officer password/challenge secret.

lunacm:> hagroup listgroups

Enabling/Disabling HA Only Mode

By default, client applications can see both physical slots and virtual HA slots. Directing applications at the physical slots bypasses the high availability and load balancing functionality. An application must be directed at the virtual HA slot to use HA load balancing and redundancy. HA Only mode hides the physical slots and leaves only the HA group slots visible to applications, simplifying the PKCS#11 slot numbering.

If an HA group member partition fails and is recovered, all visible slot numbers can change, including the HA group virtual slots. This can cause applications to direct operations to the wrong slot. If a physical slot in the HA group receives a direct request, the results will not be replicated on the other partitions in the group. When HA Only mode is enabled, the HA virtual slots are not affected by partition slot changes. Thales recommends enabling HA Only mode on all clients running HA groups.

NOTE   Individual partition slots are still visible in LunaCM when HA Only mode is enabled. They are hidden only from client applications. Use CKdemo (Option 11) to see the slot numbers to use with client applications.

To enable HA Only mode

1.Enable HA Only mode in LunaCM.

lunacm:> hagroup haonly -enable

2.[Optional] Since LunaCM still displays the partitions, you can check the status of HA Only mode at any time.

lunacm:> hagroup haonly -show

To disable HA Only mode

1.Disable HA Only mode in LunaCM.

lunacm:> hagroup haonly -disable

HA Logging

Logging of HA-related events takes place on the Luna HSM Client workstation. The log file haErrorLog.txt shows HA errors, as well as add-member and delete-member events. It does not record status changes of the group as a whole (like adding or removing the group).

The HA log rotates after the configured maximum length is reached. When it finishes writing the current record (even if that record slightly exceeds the configured maximum), the file is renamed to include the timestamp and the next log entry begins a new haErrorLog.txt.

Configuring HA Logging

Logging is automatically enabled when you configure an HA group, but you must configure a valid destination path before logging can begin. HA groups are configured on the client using LunaCM. The HA configuration settings are saved to the Chrystoki.conf (Linux/Unix) or crystoki.ini (Windows) file, as illustrated in the following example:

VirtualToken = {
VirtualToken00Label = haGroup1; // The label of the HA group.
VirtualToken00SN = 11234840370164; // The pseudo serial number of the HA group.
VirtualToken00Members = 1234840370164, 1234924189183; // The serial number of the members.
VirtualTokenActiveRecovery = activeEnhanced; // The recovery mode.
}
HASynchronize = {
haGroup1 = 1; // Enable automatic synchronization of objects.
}
HAConfiguration = {
HAOnly = 1; // Enable listing HA groups only via PKCS#11 library.
haLogPath = /tmp/halog; // Base path of the HA log file; i.e., “/tmp/halog/haErrorLog.txt”.
haLogStatus = enabled; // Enable HA log.
logLen = 100000000; // Maximum size of HA log file in bytes.
failover_on_deactivation = 1; // if a partition becomes deactivated then the client will immediately 
                              // failover and resume its operation on the other HA partitions. This 
                              // is currently an alpha feature
reconnAtt = 120; // Number of recovery attempts.
}
HARecovery = {
haGroup1 = 1; // Deprecated in this release as auto recovery will cover the use case. When cryptoki 
              // loads into memory it reads the number and if the number changes (gets incremented) 
              // then cryptoki interprets this as a manual recovery attempt.
}
To configure HA logging

Use the LunaCM command hagroup halog.

1.Set a valid path for the log directory. You must specify an existing directory.

lunacm:> hagroup halog -path <filepath>

2.[Optional] Set the maximum length for individual log files (in bytes).

lunacm:> hagroup halog -maxlength <max_file_length>

3.[Optional] Enable or disable HA logging at any time.

lunacm:> hagroup halog -disable

lunacm:> hagroup halog -enable

4.[Optional] View the current status of the HA logging configuration.

lunacm:> hagroup halog -show

HA Log Messages

The following table provides descriptions of the messages generated by the HA sub-system and saved to the HA log. The HA log is saved to the location specified by haLogPath in the Chrystoki.conf (Linux/Unix) or crystoki.ini (Windows) file.

Message Format

Every HA log message has a consistent prefix consisting of the date, time, process id, and serial number (of the affected HA group). For example:

Wed Oct  4 16:29:21 2017 : [17469] HA group: 11234840370164 …

Message Descriptions

Message ID Message/Description
HALOG_CONFIGURED_AS_PASSWORD

<MessagePrefix> configured as a "PASSWORD Based" virtual device

Description: Message advising that the virtual partition is password-authenticated. This means that you cannot add a PED-authenticated member to the group.

HALOG_CONFIGURED_AS_PED

<MessagePrefix> configured as a "PED Based" virtual device

Description: Message advising that the virtual partition is PED-authenticated. This means that you cannot add a password-authenticated member to the group.

HALOG_DROPMEMBER

<MessagePrefix> has dropped member: <SerialNumber>

Description: The connection changed from valid to invalid, determined after an HSM command (such as C_Sign) fails.

HALOG_DROPUNRECOVERABLE

<MessagePrefix> unable to reach member: <SerialNumber>. Manual Recover or Auto Recovery will be able to recover this member

Description: The connection is invalid, as determined during a call to C_Initialize.

HALOG_LOGINFAILED

<MessagePrefix> can not login to member: <SerialNumber>, autorecovery will be disabled. Code: <ErrorCodeHex> : <ErrorCodeString>

Description: The connection changed from valid to invalid, as determined during a call to C_Login.

HALOG_MEMBER_DEACTIVATED

<MessagePrefix> member: <SerialNumber> deactivated

Description: The user manually deactivated the partition, as determined after an HSM command (such as C_Sign) fails.

HALOG_MEMBER_NOW_ACTIVATED

<MessagePrefix> recovery attempt <AttemptNumber> member <SerialNumber> is now activated and will be reintroduce back into the HA group.

Description: Additional info about the recovered partition, which was deactivated and is now becoming activated.

HALOG_MEMBER_REVOKED

<MessagePrefix> member: <SerialNumber> revoked

Description: The user manually revoked the partition, as determined during a periodic recovery attempt.

HALOG_MEMBERS_OFFLINE

<MessagePrefix> all members gone offline.

Description: A situation where all members go offline. Recovery is not possible at this point.

HALOG_MGMT_THREAD_START

<MessagePrefix> management thread started

Description: This thread is responsible for managing all members and HA in general while the HA group is active. The thread starts up when the application first launches.

HALOG_MGMT_THREAD_TERMINATE

<MessagePrefix> management thread terminated

Description: This thread is responsible for managing all members and HA in general while the HA group is active. If the client application shuts down, this thread will simply terminate. The thread will start up again once the application re-launches.

HALOG_NEWMEMBER

<MessagePrefix> detected new member member: <SerialNumber>

Description: The user manually added a member to the HA group without restarting the application, as determined during a periodic recovery attempt.

HALOG_RECOVERED

<MessagePrefix> recovery attempt <Integer> succeeded for member: <SerialNumber>

Description: The connection changed from invalid to valid, as determined during a periodic recovery attempt.

HALOG_RECOVERY_ATTEMPT_#_REINTRODUCING

<MessagePrefix> recovery attempt <AttemptNumber> reintroducing <Number> token objects to recovered token <TokenNumber>

Description: Additional info about the recovered partition at which some objects were cloned.

HALOG_RECOVERYFAILED

<MessagePrefix> recovery attempt <Integer> failed for member: <SerialNumber>. Code: <ErrorCodeHex> : <ErrorCodeString>.

If autorecovery fails, then a second message is logged, as follows:

<MessagePrefix> exceeded maximum number of autorecovery attempts for member: <SerialNumber>. Autorecovery will be disabled

Description: The connection remains invalid, as determined during a periodic recovery attempt.

HALOG_RENABLEMEMBER (deprecated)

<MessagePrefix> Re-enable auto recovery process for member: <SerialNumber>

Description: The user manually requested partition recovery, as determined during a periodic recovery attempt before an HSM command.

HALOG_UNRECOVERABLE (deprecated)

<MessagePrefix> recovery attempt <Integer> failed for member: <SerialNumber>. Manual Recover or Auto Recovery will not be able to recover this member. Code: <ErrorCodeHex> : <ErrorCodeString>

Description: The connection is invalid and is not eligible for recovery.

No ID*

<MessagePrefix> member <SerialNumber> is not activated and is excluded from the HA group

Description: The HA member was not activated at the time when a C_Initialize call was made, and is therefore excluded from the HA group. Once the partition is activated, the HA group will attempt an automatic recovery, resulting in one of the two messages below

No ID*

<MessagePrefix> recovery attempt <SerialNumber> is not activated and cannot be reintroduced back into the HA group\n

Description: Recovery failed

No ID*

<MessagePrefix> recovery attempt <SerialNumber> is now activated and will be reintroduce back into the HA group.\n

Description: Recovery succeeded

* You might encounter these extra messages in the HA logs. They were added for HA development testing and therefore have no Message IDs assigned to them. They could duplicate information covered by other log messages as defined above.