February 12 2016 4:57-5:05pm: System interruption

The Ocean system is currently offline due to a misconfiguration relating to the final step of migrating to our new data centre (changing the DB replication settings). We are in the process of correcting the misconfiguration.

2016-02-12 5:04pm: The system is back online for a total downtime of 7 minutes. Root cause coming shortly.

Root cause analysis: 

Background: the Ocean database is replicated in real time from Toronto to Vancouver for redundancy. Since the February 2nd migration to the new Toronto datacentre, we've been running a third replicated database -- our old Toronto data centre -- as an insurance policy.

Cause: as part of the final stage of migration in our new data centre, our old Toronto data center and its replicated database was decommissioned at 5pm today. Unexpectedly, this disrupted the replication set in such a way as no database was elected "primary", leading to a situation in which the application had no active database connection. The replication set was repaired manually and the system was brought back online.

Actions: we will be investigating the underlying misconfiguration that would cause the unavailability of a secondary node to disrupt the primary.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request