tag:status.getstream.io,2005:/historygetstream.io Status - Incident History2024-03-29T05:22:55Zgetstream.iotag:status.getstream.io,2005:Incident/203873262024-03-28T11:30:20Z2024-03-28T11:30:20ZHigh error rate for Chat Query Channels endpoint<p><small>Mar <var data-var='date'>28</var>, <var data-var='time'>11:30</var> UTC</small><br><strong>Resolved</strong> - An issue with the QueryChannel endpoint led to some queries returning HTTP 403 response code. This incident has been resolved.</p>tag:status.getstream.io,2005:Incident/171898382023-05-09T08:33:10Z2023-05-09T08:33:10ZRealtime connections outage<p><small>May <var data-var='date'> 9</var>, <var data-var='time'>08:33</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>May <var data-var='date'> 9</var>, <var data-var='time'>08:08</var> UTC</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>May <var data-var='date'> 9</var>, <var data-var='time'>07:35</var> UTC</small><br><strong>Identified</strong> - The issue has been identified and a fix is being implemented.</p><p><small>May <var data-var='date'> 9</var>, <var data-var='time'>07:32</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating an issue with our Feed Realtime service.</p>tag:status.getstream.io,2005:Incident/166634012023-03-27T15:12:07Z2023-03-27T15:12:07ZElevated error rate in our edge network<p><small>Mar <var data-var='date'>27</var>, <var data-var='time'>15:12</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Mar <var data-var='date'>27</var>, <var data-var='time'>14:35</var> UTC</small><br><strong>Identified</strong> - The issue has been identified and our team is working on a remediation</p>tag:status.getstream.io,2005:Incident/165941652023-03-21T17:09:27Z2023-03-21T17:09:27ZElevated error rate for Feed apps in dublin region<p><small>Mar <var data-var='date'>21</var>, <var data-var='time'>17:09</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Mar <var data-var='date'>21</var>, <var data-var='time'>16:56</var> UTC</small><br><strong>Identified</strong> - The issue has been identified and a fix is being implemented.</p>tag:status.getstream.io,2005:Incident/107115732022-07-28T18:37:04Z2022-07-28T18:37:04ZAWS connectivity issues<p><small>Jul <var data-var='date'>28</var>, <var data-var='time'>18:37</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jul <var data-var='date'>28</var>, <var data-var='time'>18:36</var> UTC</small><br><strong>Update</strong> - The issue has been resolved and the service in Ohio region is operating normally.</p><p><small>Jul <var data-var='date'>28</var>, <var data-var='time'>18:27</var> UTC</small><br><strong>Identified</strong> - We experience a partial outage due to AWS connectivity issues for selected apps in the Ohio region</p>tag:status.getstream.io,2005:Incident/87898392021-12-08T12:43:43Z2021-12-08T12:43:43ZElevated API Errors on us-east<p><small>Dec <var data-var='date'> 8</var>, <var data-var='time'>12:43</var> UTC</small><br><strong>Resolved</strong> - The incident has been resolved. A post-mortem will follow.</p><p><small>Dec <var data-var='date'> 8</var>, <var data-var='time'>11:04</var> UTC</small><br><strong>Update</strong> - The issue of this morning propagated to an additional component of our infrastructure intended to dispatch messages to the end users via websocket protocol. Our team tried to mitigate the issue and the problem seems to be resolved now. We are still monitoring the situation closely.</p><p><small>Dec <var data-var='date'> 8</var>, <var data-var='time'>09:02</var> UTC</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Dec <var data-var='date'> 8</var>, <var data-var='time'>08:55</var> UTC</small><br><strong>Identified</strong> - The issue has been identified and a fix is being implemented. A temporary remediation has been put in place to mitigate the ongoing issue.</p><p><small>Dec <var data-var='date'> 8</var>, <var data-var='time'>07:20</var> UTC</small><br><strong>Investigating</strong> - We're experiencing an elevated level of API errors and are currently looking into the issue. This issue affects one shard only in our us-east region.</p>tag:status.getstream.io,2005:Incident/78911212021-08-31T21:30:00Z2021-09-01T09:59:31ZElevated API error rate in Dublin<p><small>Aug <var data-var='date'>31</var>, <var data-var='time'>21:30</var> UTC</small><br><strong>Resolved</strong> - Traffic to our Dublin infrastructure experienced elevated error rate due to a AWS outage.<br />The incident started at 11:20PM, error rate decreased at 11:38PM and the incident was resolved by 11:59PM<br /><br />We are still performing impact and root-cause analysis, a postmortem with more information will be posted here.</p>tag:status.getstream.io,2005:Incident/67472282021-04-14T05:30:00Z2021-04-14T07:04:02ZIncreased error rate on Chat API<p><small>Apr <var data-var='date'>14</var>, <var data-var='time'>05:30</var> UTC</small><br><strong>Resolved</strong> - We experienced higher than normal error rates during a database maintenance on Chat API. The error increased started at 5:24AM and resolved at 5:42AM UTC.</p>tag:status.getstream.io,2005:Incident/65369462021-03-15T18:30:00Z2021-03-15T19:11:54ZChat API<p><small>Mar <var data-var='date'>15</var>, <var data-var='time'>18:30</var> UTC</small><br><strong>Resolved</strong> - High error rate on Chat HTTP APIs</p>tag:status.getstream.io,2005:Incident/59590342021-01-06T15:35:16Z2021-01-06T15:35:16ZHigh error rate on Feed Realtime endpoint<p><small>Jan <var data-var='date'> 6</var>, <var data-var='time'>15:35</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jan <var data-var='date'> 6</var>, <var data-var='time'>10:02</var> UTC</small><br><strong>Update</strong> - Realtime updates for feeds are back to normal, we are still monitoring the traffic. <br /><br />The previous patch unfortunately did not resolve the problem and was causing realtime clients to retry the connection via the `Client not found, please reconnect` response.</p><p><small>Jan <var data-var='date'> 6</var>, <var data-var='time'>09:53</var> UTC</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Jan <var data-var='date'> 6</var>, <var data-var='time'>05:00</var> UTC</small><br><strong>Identified</strong> - The issue has been identified and a fix is being implemented.</p>tag:status.getstream.io,2005:Incident/59454862021-01-04T23:25:00Z2021-01-04T23:25:00ZFeed Realtime - SQS high error rate<p><small>Jan <var data-var='date'> 4</var>, <var data-var='time'>23:25</var> UTC</small><br><strong>Resolved</strong> - Millions of requests to the handshake endpoint of our feed realtime system broke the API. This issue has been resolved and a full post mortem will follow.</p><p><small>Jan <var data-var='date'> 4</var>, <var data-var='time'>19:45</var> UTC</small><br><strong>Update</strong> - We are continuing to monitor for any further issues.</p><p><small>Jan <var data-var='date'> 4</var>, <var data-var='time'>17:38</var> UTC</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Jan <var data-var='date'> 4</var>, <var data-var='time'>16:57</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating an issue with AWS SQS, we are receiving 100% error rate from SQS APIs.<br /><br />Our feeds realtime endpoint is currently unable to push notifications to SQS.</p>tag:status.getstream.io,2005:Incident/57627972020-12-09T17:43:15Z2020-12-10T20:37:32ZElevated error rates on Chat API<p><small>Dec <var data-var='date'> 9</var>, <var data-var='time'>17:43</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Dec <var data-var='date'> 9</var>, <var data-var='time'>17:20</var> UTC</small><br><strong>Identified</strong> - We identified an issue with Chat API that caused some messages to not being delivered via Websockets. The problem is already resolved for most applications, and the remediation should be completed for all apps shortly.</p><p><small>Dec <var data-var='date'> 9</var>, <var data-var='time'>16:59</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating an increase of errors on Chat API</p>tag:status.getstream.io,2005:Incident/47238972020-07-28T12:12:17Z2020-07-28T16:13:44ZIncreased API latency<p><small>Jul <var data-var='date'>28</var>, <var data-var='time'>12:12</var> UTC</small><br><strong>Resolved</strong> - AWS Networking issue is now resolved. We are now cleaning up our temporary remediations since they are not needed anymore. Traffic is back to normal for the last hour.</p><p><small>Jul <var data-var='date'>28</var>, <var data-var='time'>10:37</var> UTC</small><br><strong>Identified</strong> - Due to a networking issue on AWS us-east region, we are experiencing increased latency for some of the traffic on our US region. We are mitigating the problem while waiting for a final remediation on AWS infrastructure.</p>tag:status.getstream.io,2005:Incident/34815292020-01-28T16:58:54Z2020-01-28T17:21:12ZHigh error rates and timeouts<p><small>Jan <var data-var='date'>28</var>, <var data-var='time'>16:58</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jan <var data-var='date'>28</var>, <var data-var='time'>16:57</var> UTC</small><br><strong>Update</strong> - We are continuing to monitor for any further issues.</p><p><small>Jan <var data-var='date'>28</var>, <var data-var='time'>16:45</var> UTC</small><br><strong>Update</strong> - We are continuing to monitor for any further issues.</p><p><small>Jan <var data-var='date'>28</var>, <var data-var='time'>16:12</var> UTC</small><br><strong>Monitoring</strong> - A recent released caused load increase on part of the chat infrastructure and caused degraded performance and timeout errors. Remediation is in progress.</p>tag:status.getstream.io,2005:Incident/32419432019-11-21T22:16:44Z2019-11-21T22:16:44ZTimeout Errors<p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>22:16</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>22:04</var> UTC</small><br><strong>Monitoring</strong> - Increased load on some API endpoints caused traffic to spike intermittently. Adding more capacity remediated the problem.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>21:27</var> UTC</small><br><strong>Investigating</strong> - We are experiencing spikes of timeout errors; the team is investigating on the root cause and working on a remediation</p>tag:status.getstream.io,2005:Incident/29375782019-08-30T16:36:17Z2019-08-30T16:36:17ZEmails from Dashboard are not sent<p><small>Aug <var data-var='date'>30</var>, <var data-var='time'>16:36</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Aug <var data-var='date'>30</var>, <var data-var='time'>14:46</var> UTC</small><br><strong>Identified</strong> - Emails from Dashboard (invites, password resets and other notifications are currently not sent correctly).<br />We are talking to our SMTP provider (Mailgun) to resolve this issue as soon as possible.</p>tag:status.getstream.io,2005:Incident/23883072019-05-10T21:34:11Z2019-05-10T21:34:11ZDashboard redirect issue<p><small>May <var data-var='date'>10</var>, <var data-var='time'>21:34</var> UTC</small><br><strong>Resolved</strong> - This issue has been resolved.</p><p><small>May <var data-var='date'>10</var>, <var data-var='time'>20:40</var> UTC</small><br><strong>Investigating</strong> - The dashboard has a bug that's causing it to redirect some users to the homepage. Our team is investigating. APIs are fully operational, this only impacts the dashboard.</p>tag:status.getstream.io,2005:Incident/21941102019-02-06T14:23:35Z2019-02-06T14:25:33ZElevated API Errors on US-EAST<p><small>Feb <var data-var='date'> 6</var>, <var data-var='time'>14:23</var> UTC</small><br><strong>Resolved</strong> - We were experiencing an elevated level of API errors on our us-east region. This incident lasted from 2:11PM UTC to 2:18PM.</p>tag:status.getstream.io,2005:Incident/21251632019-01-02T13:22:46Z2019-01-02T13:22:46ZElevated API Errors on region EU-WEST<p><small>Jan <var data-var='date'> 2</var>, <var data-var='time'>13:22</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jan <var data-var='date'> 2</var>, <var data-var='time'>13:04</var> UTC</small><br><strong>Monitoring</strong> - We were experiencing an elevated level of API errors because of a Redis upgrade that was unsuccessful. We have resolved the issue and are monitoring for further problems</p>tag:status.getstream.io,2005:Incident/20561412018-11-22T13:08:04Z2018-11-22T13:16:10ZEU-WEST API downtime<p><small>Nov <var data-var='date'>22</var>, <var data-var='time'>13:08</var> UTC</small><br><strong>Resolved</strong> - Due to an operation mistake, API service between 12:56PM UTC and 12:58PM UTC API had very high error rate on the Europe West region. The problem is mitigated and resolved.<br /><br />Detailed API error rate over time:<br /><br />12:56PM 78%<br />12:57PM 93%<br />12:58PM 4%</p>tag:status.getstream.io,2005:Incident/18752482018-08-23T15:59:01Z2018-08-23T16:43:09ZPartial API outage<p><small>Aug <var data-var='date'>23</var>, <var data-var='time'>15:59</var> UTC</small><br><strong>Resolved</strong> - Between 03:59PM and 04:32PM UTC API traffic resulted in HTTP errors or timeouts. Only a part of Stream applications hosted on US were affected by this problem.</p>tag:status.getstream.io,2005:Incident/18504352018-08-08T01:09:10Z2018-08-08T01:09:10ZRealtime Redis Failover<p><small>Aug <var data-var='date'> 8</var>, <var data-var='time'>01:09</var> UTC</small><br><strong>Resolved</strong> - Our distributed realtime cluster uses Redis (on Elasticache) for state management. A failover of the Elasticache cluster caused realtime to be unavailable for 7 minutes. This issue has been resolved and we're investigating why the failover took 7 minutes. This impacted customers using Stream's websocket, SQS or Webhook firehose systems.</p>tag:status.getstream.io,2005:Incident/17814132018-06-21T18:02:23Z2018-06-21T18:02:24ZRealtime WS connections outage<p><small>Jun <var data-var='date'>21</var>, <var data-var='time'>18:02</var> UTC</small><br><strong>Resolved</strong> - Connections are back, currently investigating the root cause.</p>tag:status.getstream.io,2005:Incident/17060162018-04-26T07:49:24Z2018-04-26T07:49:24ZElevated API Errors<p><small>Apr <var data-var='date'>26</var>, <var data-var='time'>07:49</var> UTC</small><br><strong>Resolved</strong> - Between 7:24AM and 7:26AM UTC we had an increased error rate due to a database failover procedure triggered by AWS.</p>tag:status.getstream.io,2005:Incident/16869052018-04-13T13:33:18Z2018-04-13T13:33:18ZElevated API Errors<p><small>Apr <var data-var='date'>13</var>, <var data-var='time'>13:33</var> UTC</small><br><strong>Resolved</strong> - We experienced an elevated rate of API errors between 13:00 and 13:01 UTC. The issue resolved itself and we are looking into the cause.</p>