ESI Incident Report - 26 Feb 2025
Hi all,
Yesterday just after downtime about 50% of the ESI endpoints did not return to their normal state. This incident was caused by a misconfiguration in the RabbitMQ broker that sits between ESI and the Monolith, and prevented the Monolith from receiving (and responding to) requests from ESI. This caused ESI requests to stall and error out after their configured timeout of 10 seconds. ESI requests that were routed through Quasar were not affected by this issue.
This incident occurred during scheduled maintenance consisting of a blue-green deployment of RabbitMQ, where one cluster (green in this case) was failed-over to the new blue cluster. After diagnosing and resolving this configuration issue, the Monolith could receive request from ESI again, and all endpoints returned to their normal state. Deployment scripts have been adjusted to prevent this issue in the future.
For more details on how ESI communicates with the rest of the EVE ecosystem, a new Dev Blog is available here as part of EVE Evolved.