Slow performance on feed API endpoints
Incident Report for getstream.io
Postmortem

The problem

Due to what it seems to be a bug with EC2 Security Groups, the connectivity between one API server and one Redis backend was impaired. This connectivity issue resulted in API requests waiting until a hard-timeout occurred. At its peak 1% of all API calls were affected and either returned a 502 error code or raised client-side timeout exceptions.

Mitigation

Once the problem was clear, the EC2 server with the configuration problem was removed from the load balancer, this immediately resolved the problem.

Solution

We are talking with AWS support to isolate and validate this problem; in the meantime we instrumented all our API servers to pro-actively check for this specific issue and decommission servers experiencing the same problem.

Posted Dec 22, 2016 - 13:34 UTC

Resolved
The issue has been resolved. We're still investigating the root cause.
Posted Dec 21, 2016 - 09:55 UTC
Investigating
We're investigating a slowdown on our main feed API endpoint.
Posted Dec 21, 2016 - 09:14 UTC