Best Practices HSMs, Partitions, Clients

HSMs are really good for securing and using cryptographic material and sensitive data, and they provide the means to securely access their contents and operations (key generation, key storage, cryptographic processing), and they provide the means to implement secure practices. But to be useful, an HSM must be accessed and used; therefore, the material you are protecting can only be as secure as allowed by your practices surrounding the HSM. Here are some overall best practices for securely using Luna HSMs.

Hardware Inventory

Perhaps too obviously, it is expected that you would maintain an inventory of your cryptographic modules (HSMs), as you would for any other important equipment and assets, including

>model and type

standalone or appliance

embedded,

in-service live or backup, etc., and

serial number

>the hostname and HSM label, or other names you assigned to each HSM

>membership, if applicable, in an HA group or groups

>physical location (geographical, rack and shelf within your data center, etc.)

>asset-tracking badge or similar identifier.

Credential Inventory

Specific to the nature and purpose of HSMs, the owner and location of each credential related to the HSM must be recorded in a document that the relevant people can find, when needed:

>for ongoing operation and personnel changes,

>for mandated credential updating/cycling (example, 'password' expiry and rollover ),

>for disaster recovery.

The actual credentials are to be stored securely and separately. Physical keys should never be kept where a potential attacker could retrieve them, and text/character passwords should be kept on paper in a controlled-access safe or encrypted in a secure password manager.

A user who rarely accesses a credential, or a backup person, might need to retrieve a password or physical key, so they need to retrieve from the physical safe or from the password manager, and the location of that 'vault' or repository must be knowable as part of their duties. The list should say where the credential storage resides, and who has access to the credential storage safe(s) and must be present to open such a secure-storage lockup.

For personnel who have access to a safe storage containing

>the written passwords for Password-authenticated HSMs

or

>the iKeys (PED key for multifactor quorum-authenticated HSMs, as well as secondary typed character authentications, including passwords, PED Pins, etc.

...you want to minimize the exposure of such access. But for highest security it should require two or more persons present, which implies a form of split-knowledge secret for credentials (like the MofN option for physical iKeys (PED Keys), or for text-string passwords, some scheme where each of two or more persons knows only a portion of a complete password.

This, in turn, implies that there would be spare empowered persons in case of absences for illness, business travel, vacation, extreme weather, etc., such that you can always achieve quorum when needed.

Here (table below) is an example summary, as a starting point; it would be readily accessible by persons who might need it in the performance of their duties.

The secrets themselves are never stored in such a document, only the who and the where for retrieving them, in case of need.

Function, role, or credential

Name
(the title or label assigned to a protected container or to a scope of responsibility)

 

Owner [person(s) and/or their title] currently controlling the text string or iKey(s) protecting the role Safekeeping location of access credentials
(physically, or on your network, where to go in case of loss or destruction of primary credentials)
HSM Security Officer      
Partition 1 Security Officer      
Partition 1 primary Cloning Domain      
Partition 1 Crypto Officer      
Partition 1 Crypto User      
Audit User (HSM/crypto module role)      
Audit Domain      
Backup HSM Security Officer      
Backup Partition Crypto Officer      
Backup Partition Domain      

Roles, credentials, and areas of responsibility

HSM Security Officer

The HSM SO handles all the administrative and configuration tasks at the HSM level, including :

>Initializing the HSM and setting the SO credential.

>Setting and Changing global HSM policies.

>Creating/deleting the application partition(s).

>Updating the HSM firmware.

The HSM SO credential is a password string in A-Series HSMs (Password authenticated), or a Blue iKey (PED Key) in S-Series HSMs (Multifactor-Quorum authenticated). As this is the admin of the cryptographic module (HSM), this credential has considerable power(*) among all of the roles. If attempted access fails for three (3) consecutive HSM SO login attempts,

>application partitions are destroyed,

>the entire cryptographic module (HSM) is zeroized and

>all of its contents are rendered unrecoverable.

The number is not adjustable. As soon as the HSM SO role is successfully authenticated (logged in), the bad login counter is reset to zero. Other roles have their own rules, as follows.

(* The HSM SO has the maximum administrative power over the cryptographic module, but has no access to the contents of application partitions within the crypto module (HSM). This separation of roles is inherent in the security regime underlying the Luna HSM; however, if you do not require such separation in your operations, simply set up your operational procedures to allow one person to control the access credentials for all roles on an HSM. The behavior of the cryptographic module/HSM is governed by HSM configuration settings: some of those are fixed for the module version you have purchased, while some have merely default values that you can modify with policy settings. See HSM Capabilities and Policies.)

Partition Security Officer

The Partition SO handles all administrative and configuration tasks in the application partition, a functional logical subdivision of the overall cryptographic module, including:

>Initializing the partition, setting the PO credential, and setting a cloning domain secret for the partition.

>Configuring partition policies**.

>Initializing the Crypto Officer role.

>Activating the partition (only for Multifactor Quorum authenticated)

This credential is similar to the HSM SO but it applies only to an individual application partition, and not to other aspects of the cryptographic module (HSM). The Partition SO credential is a password string in A Series HSMs (Password authenticated), or a Blue iKey (PED Key) in S Series HSMs (Multifactor-Quorum authenticated). If attempted access fails for ten (10) consecutive Partition SO login attempts, the partition is zeroized and all cryptographic objects are destroyed. The Partition SO must re-initialize the partition and a Crypto Officer role, who can restore key material from a backup device.

The Partition SO has administrative oversight for the partition, but cannot see or access partition contents. It is up to the partition's Crypto Officer to create, use, move/clone, and delete partition objects. See below.

(** Partition-level capabilities and policies are mostly inherited from related crypto-module/HSM-level capabilities and policies, and as with the HSM-level, some settings at the partition level are fixed and some merely have a default value that can be modified by policy setting configuration. See Partition Capabilities and Policies.)

Partition Crypto Officer

The Crypto Officer is the primary user of the application partition and the cryptographic objects stored on it. The Crypto Officer has the following responsibilities:

>Creating, deleting, and modifying cryptographic objects via user applications.

>Performing cryptographic operations via user applications.

>Managing backup and restore operations for partition objects.

>Creating and configuring HA groups.

>Initializing the Crypto User role.

>The CO can modify keys - in PKA schemes, must provide per-key authorisation (PKA) data for unassigned keys.

>The CO can unblock blocked (due to per-key authentication failures) PKA keys.

>The CO can increment usage counters and set or change the limit.

>The CO can perform rollover of the Scalable Key Storage Masking Key (the SMK).

>If attempted access fails for ten consecutive Crypto Officer login attempts, the CO and CU roles are locked out. The default lockout threshold of 10 is governed by Max failed user logins allowed, and the Partition SO can set this threshold lower if desired (see Partition Capabilities and Policies). Recovery depends on the setting of HSM policy SO can reset partition PIN, as follows:

If HSM policy 15 is set to 1 (enabled), the CO and CU roles are contingently locked out by too many consecutive failed login attempts. The lockout does not have a timed expiry, but it can be ended by the Partition SO who must unlock the CO role and reset the credential (Resetting the Crypto Officer, Limited Crypto Officer, or Crypto User Credential).

If HSM policy 15 is set to 0 (disabled), the CO and CU roles are permanently locked out and the partition contents are no longer accessible. The Partition SO must re-initialize the partition and the Crypto Officer role, who can then restore key material from a backup. This is the default setting.

Crypto User

The Crypto User is an optional role that can perform cryptographic operations using partition objects in a read-only capacity, but can create only public objects. This role is useful in providing limited access; the Crypto Officer is the only role that can make significant changes to the contents of the partition, while the Crypto User has the following capabilities:

>Performing operations like encrypt/decrypt and sign/verify using objects already on the partition

>Creating and backing up public objects (see Partition Backup and Restore)

>The CU can increment usage counters but, unlike CO, cannot change/set the limit

>If attempted access fails for ten consecutive Crypto User login attempts, the CU role is locked out. The default lockout threshold of 10 is governed by Max failed user logins allowed, and the Partition SO can set this threshold lower if desired (see Partition Capabilities and Policies). The CO must unlock the CU role and reset the credential (see Resetting the Crypto Officer, Limited Crypto Officer, or Crypto User Credential).

Partition Cloning Domain

A security domain or cloning domain secret is a layer of encryption that is created, during initialization, on a cryptographic module (HSM) or HSM partition that you control. The domain determines whether a cryptographic object can leave the HSM, and where it can go if it is allowed to leave.

Cloning is a secure-copy operation by which sensitive HSM objects are copied, while strongly encrypted, from oneThalesHSM. to another ThalesHSM. The security domain, or cloning domain, is a special-purpose secret that is attached to a partition on an HSM. It determines to which, and from which, other partitions (on the same HSM or on other HSMs) the current partition can clone objects. Partitions that send or receive partition objects by means of the cloning protocol must share identical cloning domain secrets.

Cloning domain is set generally when the partition is initialized and often forgotten about for years until there is a need to clone the data (restoring, migrating to a newer-generation HSM, etc.). It is very strongly recommended to make sure that the domain is stored securely and well documented so that it can be used in the future without any issues. The domain is a typed text/character secret in A series HSMs and a Red iKey (PED Key) in S series HSMs. There is no way to verify a domain other than by trying a cloning or backup procedure. Note that Luna Client-mediated High Availability (HA) uses key/object cloning as the core of the feature. See High-Availability Groups.

Domains, plural

Prior to Luna HSM version 7.8.0, only one domain could exist and be used on a partition, and objects were not allowed to move from one domain to another.

From firmware version 7.8.0 onward the historic capability is expanded (see Allow Extended Domain Management), while retaining integration with existing applications, by optionally allowing the existence of up to 2 additional domains in a partition. This permits

>migrating keys between Password based and Multifactor Quorum (PED) authenticated HSMs (this includes Luna Cloud HSM)

>changing or rolling-over of partition domains

in case of compromise,

or

when mandated by an organization's security rules.

Recommendation for Multifactor Quorum (PED) authenticated HSM

It is generally recommended to use MofN for credentials. For example 2 out of 3: Where at least any two people out of three are required at the same time to login to the HSM.

It is also recommended to create duplicate PED keys for every role, in case of loss or damage, so this includes full duplicate sets where MofN has been invoked.

You can keep a duplicate set of iKey (PED Key) credentials in secure lockup, on premises for local recovery needs, but it is also strongly recommended to have copies of those credentials protected in secure off-site storage for purposes of disaster recovery.

NOTE   Although the iKey secrets can be split as many as 16 ways, it is recommended to make only as many splits (n) and require only as many 'members' for quorum (m) as necessary - with sufficient spares for absent key holders - because more splits means more operations with the PED, when there might be time limitations, and larger quorum requirements impose increasing demands on personnel and planning, to ensure that enough secret-split (iKey) holders can be available whenever needed.

HA Recommendations

Luna HSMs provide scalability and redundancy for critical cryptographic applications. For applications that require continuous, uninterruptible uptime, the Luna HSM Client allows you to combine application partitions on multiple HSMs into a single logical High-Availability (HA) group, where members can optionally be geographically distanced to ensure disaster resistance. See High-Availability Groups.

Following are points to take into account while setting up HA:

>All HSMs in an HA group must have the same firmware versions. [see NOTE below]

>Member partitions must have a common cloning domain and same Crypto Officer password, in order to function in a client-mediated HA group.

>It is recommended that all the HSMs in an HA group have same policy settings.[see NOTE below]

>HA should be setup for auto-recovery. Using the value as ‘-1’ sets the auto-recovery retries to infinite. The interval between each try can be configured between 60 and 1200 seconds.

>The recovery mode defaults to activeBasic but should be changed to ActiveEnhanced. ActiveEnhanced mode sets an auto reconnect logic in the HA that helps to recover the session on its own in case of a network failure.

>If physical slots are not required to be used directly by any application, it is always recommended to setup HAOnly as that defaults the HA virtual slot to 0 and hides the physical slots. If it is not set, then

the list of all slots is visible, and changes made directly to a physical slot, rather than via the virtual slot cause the HA group to be out-of sync, defeating the advantages of HA and potentially causing trouble with applications

in case of a failure of one of the members, then without HAOnly, the slot number of the HA virtual slot can change. This can cause applications to direct operations to a wrong slot. When HAOnly is enabled, the slot numbers do not change as a member is dropped or added.

>In case of a member getting dropped for more than a day, it is recommended to manually sync the HA once the member is back into the group to ensure inclusion of all the objects added to the group while the member was not present.

NOTE   Allowing member HSMs to have different policies or different firmware could result in some member partitions rejecting some keys (or key sizes, or curve variants, etc.), or some operations, as unsupported or forbidden, while other members would attempt to accept and perform such operations, but HA function of the group would be compromised.

The only situation where HA group members might be expected to have different firmware or settings is during a brief upgrade or migration interval, where all group members are expected to settle into the new conformation in short order.

Recommended configuration for HA

HA auto-recovery in activeEnhanced mode is strongly recommended

When auto-recovery is enabled, Luna HSM Client performs periodic recovery attempts when it detects a member failure. To enable auto-recovery on an HA group, see hagroup retry. For most implementations of Luna HA, we recommend HA auto-recovery with mode set to active-enhanced see hagroup recoverymode. In other words, in the absence of a good reason to avoid those configuration settings, use them.

HAOnly is strongly recommended

Object management of an HA group is performed by the client that builds/owns the group, only with operations that are sent via the HA virtual slot number owned by that specific client. In other words, do use just the virtual slot for all operations and do use the hagroup haonly setting to make the individual member slots invisible, thus removing any temptation for anyone to make direct changes to individual members, bypassing the scope and control of the virtual slot, and potentially crippling HA synchronization and auto-recovery.

If your use-case demands that you directly address individual members of an HA group, be aware that objects created or deleted in that fashion are not 'known' to the HA virtual slot, and therefore are not replicated to other slots during HA synchronization; you would have to perform any replication between members manually (or via your own application) for any objects not created or deleted via the HA virtual slot. In such a use case, you might consider creating your own HA functionality, in which case see High Availability Indirect Login.

Representative Scenario

Consider an installation where you might have, say, HSMs and clients in one datacenter (A) and HSMs and clients in another datacenter (B), with connections and HA grouping both within and between the two datacenters, as you might do to spread out and minimize disaster risk. So one client has connections to members of its own HA group that reside in the local datacenter with the client, and to other members of its own HA group that reside in the other datacenter. And the reverse for a client in the other datacenter, having members in both places.

So if an object is deleted or created or modified by an application configured to use the HA virtual slot with client "A", and that client loses connectivity to all but at least one HA group member, that operation will get replicated to the other HA members in the other datacenter as those members are re-introduced into the group by client "A". And vice versa for HA clients in the second datacenter with members in the first datacenter.

When connectivity between datacenters is re-established, the client(s) from each datacenter synchronizes only the operations sent by each respective client to the other members of its own HA group.

Synchronization in a group propagates only changes made by the group's own client; the sync operation does not compare the entire contents of each HA member partition when that member is reintroduced. To put that another way, HA is not expecting changes that were made by other means than the HA virtual slot.

So if the HA virtual slot on client A deletes some objects, creates some objects, modifies some objects, only those operations get replicated to the 'physical' member partitions located on the HSMs in the other datacenter. For HA auto-recovery, the Client A does not directly compare the entire contents of an HSM partition HA member in one datacenter with the contents of its respective HA group members in the other datacenter. It replicates only operations that were sent to its own 'client A' HA slot. If a 'physical' partition was also modified as a member of another HA group, or if a partition was addressed directly and modified by an application, without using the HA virtual slot to do it, then those changes are not noticed by the current Client overseeing its own HA group.

In the case of directly addressing individual member slots (again, not recommended unless you absolutely must), you must be diligent about cleaning up after all such operations, else the affected partitions can become cluttered, potentially slowing or disrupting HA operation.

If working outside the recommendations...

If auto-recovery was not enabled and set to active enhanced, or if connectivity was re-established outside any HA recovery attempts that may have been configured on the client, then the HA synch command must be manually executed.

Manually executing an HA synchronization from a client, then compares the contents of each partition that is a member of the HA group based on handle IDs of objects.

>Manual sync adds objects to a partition only if it determines those objects are missing.

>Manual sync does not delete objects, nor does it synch changes to objects (such as changes made to any attributes of an object).

>Therefore, manually delete objects from a partition that you don't want replicated to other HA group members, and do that before manually synchronizing.

Summing up

The best-practice recommendation when configuring and managing HA groups is to enable HA auto-recovery in activeEnhanced mode for almost any active production scenario.

By contrast, a manual synchronization is not recommended in an active production scenario, as that could potentially cause a race condition.

HSM & Partition Policies

HSM Capabilities are features of HSM functionality, set at time of manufacture, based on the HSM model you selected at time of purchase. You can add new capabilities to the HSM by purchasing and applying capability licenses from Thales. Some capabilities (whether original or added via license update) have corresponding modifiable HSM policies.

HSM Policies are configurable settings that allow the HSM Security Officer to modify the function of their corresponding capabilities. Some policies affect HSM-wide functionality, and others allow further customization of individual partitions by the Partition Security Officer.

Some policies affect the security of the HSM. As a security measure, changing those security-affecting policies results in application partitions, or the entire HSM, being zeroized. Such policies are called Destructive policies.

CAUTION!   If your HSM, or a partition of it, contains important keys or data, you should always have a backup of your sensitive or important material before making policy changes.

Here are recommendations for setting up the policies that are most commonly used:

1.Set the destructive policies at the time of initialization in order to avoid the process of rebuilding or restoring the key material if you make the change later.

2.The policy associated with the HSM-level capability 15 : Enable SO reset of partition PIN is set to OFF by default. Change the “SO can reset partition PIN” policy to ON at the time of initialization so that the SO can recover the Crypto Officer password in case of a lockout. If the policy is set to OFF, the entire partition has to be rebuilt in case of a lockout.

3.The policy associated with the HSM-level capability 12 : Enable non-FIPS algorithms is set to ON by default. Turn the “Allow non-FIPS algorithms” policy OFF for the HSM to operate in FIPS 140 approved configuration. The policy is destructive, hence the HSM must be reinitialized after setting this policy.

4.The policy associated with the HSM-level capability 21: Enable forcing user PIN change” is set to ON by default.

For credentials created by partition SO, when the policy is ON, the Crypto Officer password must be changed by the CO before being able to use it - this forces initial separation between the .administrative and operational roles.

Turning the Force user PIN change after set/reset policy OFF allows the CO (Crypto Officer) to use the credential assigned by the Partition SO, on the assumption that your security regime doesn't demand that you maintain separation between the administrative and operational functions for your application.

5.The policy associated with the partition-level capability 22: Enable activation is not valid for password-authenticated partitions, and affects Multifactor Quorum partitions.

When the policy set to 0 (zero, meaning OFF), it requires that black and/or gray PED Keys must be presented at each login via LunaCM or by a client application.

Set Allow activation to 1 (one, meaning ON) so that the iKey/PED Key secrets are encrypted and cached, and only a keyboard-entered challenge secret is required for login while the HSM remains powered.

6.The policy associated with the partition-level capability 23: Enable auto-activation also affects Multifactor Quorum partitions.

Set the policy to 0 (zero, meaning OFF), to cause the partition to deactivate in the event of a power loss, such that the return to operation after power is restored requires presenting the black and/or gray iKeys (PED Keys).

Set Allow auto-activation to 1 (one, meaning ON) so that the ability to automatically resume partition activation (without presenting iKeys) is maintained through a power loss up to 2 hours in duration. Beyond two hours, the iKey data is uncached and the primary authentication must be presented again, to resume operation with your application.

Policies 22 and 23 are applicable only to Multifactor Quorum (PED) authenticated HSMs; activation is not applicable to password authentication.

HSM Backup

Following these recommendations when taking backups of HSMs:

1.In addition to having an online backup in other HSMs in HA, it is strongly recommended to have an offline backup in a Backup HSM, which helps in recovery from unexpected human errors or application errors.

2.The backup HSM should be stored in a secure safe.

3.Backup frequency should be in line with the frequency of keys being created by the application.

4.The name of the partition on the backup HSM should resemble the partition on the source and it is a good idea to add a date of the backup within the label itself.

Disaster Recovery

In case of an event that results in zeroization, follow these steps to restore the HSM to a working state:

1.Clear the tamper events, if any.

2.Locate the credentials and their owners for a successful restore, referring to your checklist similar to the table that was suggested earlier in this document.

3.Initialize the HSM using the HSM SO credentials. If necessary, use credentials retrieved from off-site storage.

4.Set the required HSM policies.

5.Create the partitions and initialize them using Partition SO credentials with the same cloning domain that was used while creating the original partitions.

6.Set the appropriate partition policies (Optional).

7.Restore the objects from either another member from the same HA group or from an offline backup.

8.Connect the application to the partition and restart the application.

Logging

Audit logs record important actions and events in the cryptographic module (HSM). You can decide the level of detail that needs to be logged.

Syslog on your host system records selected events occurring in the host computer, but does not capture events within the cryptographic module.

Compare and contrast the respective logging services at Audit Logging.

>Always configure the Audit role and audit logging. In a rigorous auditing regime, you would want to configure Audit before initializing the HSM SO, as this order ensures that the audit logs capture timestamped events from the beginning of HSM usage with no gaps in the record.

>As a best practice, keep your important logs safely on remote servers. Configure syslog remotehost add and audit remotehost add, ensuring that the receiving host and port configuration are not the same for both remote syslog and remote audit log.

>Use TLS encryption when sending logs to a remote repository, and ensure that the remote syslog host(s) is/are secure from physical or network intrusion.

>Host logs on your HSM host are stored as plain text (syslog). The physical security that you erect around the HSM's host should also protect the host logs from tampering.

>Establish log rotation schedules, balancing quantity/intensity of logging with frequency of rotation - the more detailed the logs you configure, the faster the storage space is used, therefore the more frequently you should rotate the logs out and ship them off to your remote repository.

>Log only as detailed as makes sense for your industry and the demands of your auditing agencies - very intensive audit logging can consume significant HSM resources and might slow cryptographic operations for your client applications.

>When you set up HSM audit logging, be sure to verify your procedures end-to-end such that you can access (decrypt) and verify the logs you are capturing - don't wait until the auditors arrive.