High Availability (HA) Implementations

If you use the SafeNet Luna PCIe HSM HA feature then the calls to the SafeNet Luna PCIe HSMs are load-balanced. The session handle that the application receives when it opens a session is a virtual one and is managed by the HA code in the library. The actual sessions with the HSM are established by the HA code in the library and hidden from the application and will come and go as necessary to fulfill application level requests.

Before the introduction of HA AutoRecovery, bringing a failed/lost group member back into the group (recovery) was a manual procedure.

The Administration & Maintenance section contains a general description of the how the HA AutoRecovery function works, in practice.

For every PKCS#11 call, the HA recover logic will check to see if we need to perform auto recovery to a disconnected appliance. If there is a disconnected appliance then it will try to reconnect to that appliance before it proceeds with the current PKCS#11 call.

The HA recovery logic is designed in such a way that it will try to reconnect to an appliance only every X secs and N number of times where X is pre-set to one minute, and N is configurable via Lunacm.

For HA recovery attempts:

>The default retry interval is 60 seconds.

>The default number of retries is effectively infinite.

>The HA configuration section in the Chrystoki.conf/crystoki.ini file is created and populated when either the interval or the number of retries is specified in the LunaCM commands hagroup retry and hagroup interval.

The following is the pseudo code of the HA logic

if (disconnected_member > 0  and recover_attempt_count < N and time_now - last_recover_attempt > X) then 
   performance auto recovery
   set last_recover_attempt  equal to time_now
   if  (recovery failed)  then
      increment recover_attempt_count  by 1 
   else    
      decrement disconnected_member by 1
      reset recover_attempt_count to 0
   end if
end if

The HA auto recovery design runs within a PKCS#11 call. The responsiveness of recovering a disconnected member is greatly influenced by the frequency of PKCS#11 calls from the user application. Although the logic shows that it will attempt to recover a disconnected client in X secs, in reality, it will not run until the user application makes the next PKCS#11 call.

Detecting the Failure of an HA Member

When an HA Group member first fails, the HA status for the group shows "device error" for the failed member. All subsequent calls return "token not present", until the member (HSM Partition or PKI token) is returned to service.

Here is an example of two such calls using CKDemo:

Enter your choice : 52  
Slots available:
  slot#1 - LunaNet Slot             slot#2 - LunaNet Slot             slot#3 - HA Virtual Card Slot Select a slot: 3 HA group 1599447001 status:         HSM 599447001      - CKR_DEVICE_ERROR     
   HSM 78665001       - CKR_OK
Status: Doing great, no errors (CKR_OK) TOKEN FUNCTIONS
( 1) Open Session  ( 2) Close Session  ( 3) Login
( 4) Logout        ( 5) Change PIN     ( 6) Init Token
( 7) Init Pin      ( 8) Mechanism List ( 9) Mechanism Info
(10) Get Info      (11) Slot Info      (12) Token Info
(13) Session Info  (14) Get Slot List  (15) Wait for Slot Event
(16) InitToken(ind)(17) InitPin (ind)  (18) Login (ind)
(19) CloneMofN OBJECT MANAGEMENT FUNCTIONS
(20) Create object (21) Copy object    (22) Destroy object
(23) Object size   (24) Get attribute  (25) Set attribute
(26) Find object   (27) Display Object SECURITY FUNCTIONS
(40)  Encrypt file (41) Decrypt file   (42)  Sign
(43)  Verify       (44) Hash file      (45)  Simple Generate Key
(46)  Digest Key HIGH AVAILABILITY RECOVERY FUNCTIONS
(50) HA Init       (51) HA Login       (52) HA Status KEY FUNCTIONS
(60) Wrap key      (61) Unwrap key     (62) Generate random number
(63) Derive Key    (64) PBE Key Gen    (65) Create known keys
(66) Seed RNG      (67) EC User Defined Curves CA FUNCTIONS
(70) Set Domain    (71) Clone Key      (72) Set MofN
(73) Generate MofN (74) Activate MofN  (75) Generate Token Keys
(76) Get Token Cert(77) Sign Token Cert(78) Generate CertCo Cert
(79) Modify MofN   (86) Dup. MofN Keys (87) Deactivate MofN CCM FUNCTIONS
(80) Module List   (81) Module Info    (82) Load Module (83) Load Enc Mod  (84) Unload Module  (85) Module function Call OTHERS
(90) Self Test     (94) Open Access    (95) Close Access
(97) Set App ID    (98) Options OFFBOARD KEY STORAGE:
(101) Extract Masked Object    (102) Insert Masked Object
(103) Multisign With Value     (104) Clone Object
(105) SIMExtract               (106) SIMInsert
(107) SimMultiSign SCRIPT EXECUTION:
(108) Execute Script
(109) Execute Asynchronous Script
(110) Execute Single Part Script (0) Quit demo Enter your choice : 52 Slots available:
  slot#1 - LunaNet Slot         
   slot#2 - LunaNet Slot         
  slot#3 - HA Virtual Card Slot Select a slot: 3 HA group 1599447001 status:         HSM 599447001      - CKR_TOKEN_NOT_PRESENT     
   HSM 78665001       - CKR_OK
Status: Doing great, no errors (CKR_OK) --- end ---