May 18, 11:43 EDT
We've fully rolled out our alternative fix. This should give us more resiliency to node failures. We will further investigate the node failures in the morning.
May 16, 01:13 EDT
We've nearly completed the rollout of our alternative fix. There is currently no disruption in the system.
May 16, 00:45 EDT
Our alternative fix is going well. We continue to roll it out slowly.
May 16, 00:25 EDT
All backlogged work has been completed. Our test in our test environment went well, we are going to slowly implement in production.
May 16, 00:07 EDT
We're continuing to process backlogged work. We're testing alternative fixes in our test environment.
May 15, 23:39 EDT
We are still investigating the reason these nodes are failing. So far no additional nodes have failed. We're working through backlogged work.
May 15, 23:20 EDT
More nodes have gone down, we are trying to narrow down why the nodes are running into issues.
May 15, 23:07 EDT
Another node in our messaging infrastructure has gone down. We have fixed our messaging infrastructure and are working on resolving the issue. Some false check alerts have gone out.
May 15, 22:51 EDT