Broker metrics and metrics integrations degradation
Window: 2026-04-23 03:25 – 05:35 UTC (2h 10m)
Broker metrics (connections, channels, queues, consumers, message rates, node and netsplit state) were degraded, progressively dropping until recovery at 05:35 UTC. Downstream effects:
Not affected: server metrics (CPU, memory, disk) and their corresponding alarms, Notice alarms, broker availability, and the metric graphs shown in the CloudAMQP console (these use a separate data path
and continued to update normally).
Broker metrics are polled from each cluster's management HTTP API by a pool of collector workers and published to an internal message bus that both the Alarms service and Metrics Integrations consume
from. At 03:25 UTC this service became unresponsive without raising an error. Affected workers silently stopped polling while remaining alive to the platform, so automatic restart did not trigger. On-call
staff began investigating around 05:00 UTC; force-restarting the service restored metric flow.
The exact trigger has not been identified. Our focus is on ensuring the condition is detected and handled promptly if it recurs.
CloudAMQP offers a new generation of metrics integrations based on Prometheus, with a re-engineered pipeline that does not rely on these centralised services — each server forwards data directly to your
endpoint. These have been running in production for some time and have proved very reliable.