Interruption to servce

Incident Report for IPTelecom

Resolved

At 10:11am on 10th January, an incident occurred with the main IP Telecom database cluster. The database stopped responding to queries, which caused calls to drop and SIP registrations to fail. The failure was due to an edge case in the database cluster platform, where a node fails, but in a way that does not inform the remaining nodes that it has failed. The remaining nodes become blocked until the failed node either dies completely or recovers.

A number of attempts were made to re-establish service by the infrastructure team. Unfortunately, one of these interventions compounded the issue and created a cascading failure on a number of other components in the HostedPBX platform. These secondary failures were recovered within a few minutes.

At 10:40am the database layer was successfully restarted. Calls and registrations immediately started working again. No further issues have
been observed since that time.

We have a ticket open with our database vendor, and are actively working with them to ensure this issue does not occur again.
Posted Jan 10, 2022 - 13:39 GMT

Update

We are continuing to monitor for any further issues.
Posted Jan 10, 2022 - 12:18 GMT

Monitoring

Service has been restored. We are investigating the root cause of the issue and continue to monitor the system.
Posted Jan 10, 2022 - 10:48 GMT

Update

We are continuing to investigate this issue.
Posted Jan 10, 2022 - 10:47 GMT

Investigating

There are issues with calls on trunks and hosted services at the moment. We are investigating.
Posted Jan 10, 2022 - 10:31 GMT
This incident affected: Hosted PBX and SIP Trunking.