Partial outage in our Ingest subsystem [RESOLVED]
Incident Report for CloudWisdom
Postmortem

On Saturday Oct 17th, 2020 between 1:35 PM and 1:45 PM EST, our ingestion pipeline buffers experienced a cascading failure as the ran out of memory. Due to previous fixes this only resulted in data loss for particular element types and resulted in no false check alarms. We have since decreased their individual capacity to throttle the data input per host which we expect to mitigate against these types of failures in the future.

Posted Oct 20, 2020 - 10:16 EDT

Resolved
Today between 1:35 PM and 1:45 PM Eastern time we had an issue with our Ingest subsystem. It caused a partial degradation and data loss for AWS and Azure cloud data collection. As a result, no analytics and alerting were happening during this time. Issue has been fully resolved and we will post a detailed incident summary soon.
Posted Oct 17, 2020 - 06:30 EDT