Data Ingest and Data Processing degraded
Incident Report for CloudWisdom
Postmortem

Yesterday starting at 11:25AM Eastern we had a degradation in our messaging infrastructure. This caused outages in our ingest API between 11:36 AM Eastern-11:52AM Eastern and 12:16PM Eastern-12:24PM Eastern. During the duration of the incident collection of monitoring data for cloud elements was degraded with a loss in data for most elements between 11:25AM Eastern and 1:15PM Eastern. There were not false check alarms during the incident.

During this issue our messaging infrastructure was further degraded than we have experienced before. We have increased the resources in our messaging cluster which helped bring the cluster back online and will improve performance going forward. Our messaging infrastructure is critical to our ingestion pipeline and we are working with AWS to improve the performance and reliability of this subsystem.

Posted Dec 02, 2020 - 13:36 EST

Resolved
Cloud collection is back online and we have caught up on all backlogged data. We found that there was data loss for cloud elements. We will post a full report within a business day.
Posted Dec 01, 2020 - 13:46 EST
Monitoring
The cloud collection was fixed and we are monitoring the results.
Posted Dec 01, 2020 - 13:34 EST
Update
We are making progress bringing cloud collection back online. Thank you for your patience.
Posted Dec 01, 2020 - 13:27 EST
Update
We've stabilized our messaging infrastructure and are working to restore cloud collection.
Posted Dec 01, 2020 - 13:10 EST
Update
We're making progress on stabilizing our messaging infrastructure, but it is not yet fully operational.
Posted Dec 01, 2020 - 12:58 EST
Update
We're still working to get our messaging infrastructure back online. Cloud collection is still delayed.
Posted Dec 01, 2020 - 12:47 EST
Update
There is no new information to share at the moment. We're still working to fully restore cloud data ingest.
Posted Dec 01, 2020 - 12:34 EST
Update
The system continues to catch up on backlogged work. We're working to fully restore cloud data ingest.
Posted Dec 01, 2020 - 12:20 EST
Identified
We are experiencing lag in the system which is causing our data processing to be behind.
Posted Dec 01, 2020 - 12:09 EST
This incident affected: Data Ingestion and Data Processing.