Home >

Administration Guide > High Availability (HA) Mode > HA Recovery

HA Recovery

HA recovery is hands-off resumption by failed HA Group members, or it is manual re-introduction of a failed member if "autorecovery" has not been switched on. Some reasons for a member to fail from the group might be:

- the appliance loses power (but regains power in less than the 2 hours that the HSM preserves its activation state)

- the network link from the unit is lost and then regained.

HA recovery takes place if:

HA autorecovery is enabled, or if you detect a unit failure and manually re-introduce the unit (or its replacement)

HA group has at least 2 nodes

HA node is reachable (connected) at client startup

HA node recover retry limit is not reached. Otherwise manual recover is the only option to bring back the downed connection(s)

If all HA nodes fail (no links from client) no recovery is possible.

The HA recovery logic in the library makes its first attempt at recovering a failed member when your application makes a call to its HSM (the group). That is, an idle client does not start the recovery-attempt process.

On the other hand, a busy client would notice a slight pause every minute, as the library attempts to recover a dropped HA group member (or members) until the member has been reinstated or until the timeout has been reached and it stops trying. Therefore, set the number of retries according to your normal situation (the kinds and durations of network interruptions you experience, for example).

HA Autorecovery vs Manual Recovery

In previous releases, Autorecovery was not on by default, and needed to be explicitly enabled with vtl haAdmin -autorecovery command.

Beginning with Luna HSM release 6.0, HA Autorecovery is automatically enabled when the recovery retry count is set.

lunacm:> ha rt -c 3

        HA Auto Recovery Count has been set to 3

Command Result : No Error

lunacm:>

 

For practical steps to replace a failed HA group member, see "HA Replacing a Failed Luna SA".