Yesterday starting at 11:25AM Eastern we had a degradation in our messaging infrastructure. This caused outages in our ingest API between 11:36 AM Eastern-11:52AM Eastern and 12:16PM Eastern-12:24PM Eastern. During the duration of the incident collection of monitoring data for cloud elements was degraded with a loss in data for most elements between 11:25AM Eastern and 1:15PM Eastern. There were not false check alarms during the incident.
During this issue our messaging infrastructure was further degraded than we have experienced before. We have increased the resources in our messaging cluster which helped bring the cluster back online and will improve performance going forward. Our messaging infrastructure is critical to our ingestion pipeline and we are working with AWS to improve the performance and reliability of this subsystem.