Signaling board failures

A signaling board failure is detected by the HMI on the local signaling node. A failure can be a software failure on the board detected by the txmon process and reported to the HMI service, or a hardware failure such that the HMI service loses communication with the board. Both are reported to registered applications as board failures so that recovery can take place. Two different recovery scenarios are distinguished: failure of the primary board and failure of the backup board.

Primary board failure

When the application receives a HMI event indicating a failure of the primary board, it typically initiates a switchover to the backup signaling board by calling hmiPrimary. Call processing applications (or applications using other SS7 signaling services) then wait for the now primary status indication from its service provider before resuming data traffic.

Once the switchover is initiated, an application reloads the failed board with hmiLoadBoard. When the download is complete (HMI_EVN_STARTING event is received), the application sets the reloaded board into the backup state with hmiBackup. At that point, any SS7 service applications must rebind to their service providers. Any failed signaling links terminated on the reloaded board are automatically activated by the primary MTP 3. SIGTRAN associations are automatically re-established by the reloaded board if M3UA is configured as an ASP or IPSP client.

After the reload and rebind, the TUP, TCAP, and SCCP tasks automatically resynchronize with the primary TX board. The backup is then ready to take over operation. The backup ISUP layer considers all circuits to be idle. The application must re-synchronize the backup by checkpointing all non-idle circuit states through the ISUP service. The recovered board is then ready to take over the role of primary if needed.

If the board fails to reload cleanly (the HMI_EVN_STARTING event is not received within a reasonable time period), as might be the case with a true hardware failure, use hmiHaltBoard to stop the board. Manual intervention is required to recover the failed board.

Backup board failure

Failure of a backup board is detected and reported in the same fashion as the primary board. The board is typically reloaded (if possible) and set into the backup state. Applications must rebind with their service providers. Any failed signaling links terminated on the reloaded board are automatically activated by the primary MTP 3. SIGTRAN associations are automatically re-established by the reloaded board if M3UA is configured as an ASP or IPSP client.

After a backup with TUP, TCAP and/or SCCP is brought back into service, the application is ready to take over since TUP, TCAP, and SCCP automatically re-synchronize with the primary TX board.