RMG failure detection and recovery

To facilitate failure detection and recovery, the primary RMG process monitors the board through the Health Management service. The primary RMG periodically sends heartbeat messages to the backup, allowing the backup to monitor the primary's status.

When the primary RMG process detects board failure or a reload or halt command is received, it initiates failure recovery by negotiating a switchover to the backup board. If possible, the failed board is reloaded and brought back into service as the backup board (unless a halt command was received, in which case the board is halted and remains out of service).

The RMG process (both primary and backup) also supports a planned changeover command that causes the primary and backup boards to switch roles.

To detect a failure of the primary RMG process or signaling node, the backup RMG continuously monitors for the receipt of heartbeat messages from the primary. If no heartbeat messages are received for five consecutive heartbeat periods, the backup initiates its own recovery and switches to primary mode.