Slow load times and outages across products

Incident Report for iPaper

Postmortem

What happened

On Monday, February 6th, 2023 at 12:26 PM, we released a new version of our software and got notified by our automated checks that we were seeing partial outages across our Flipbooks viewer and our administration. We quickly started investigating the cause as we were seeing a high volume of errors in our logs, and we saw that our primary domain “viewer.ipaper.io” was redirecting all visitors to our admin application. This flooded our admin with requests that should have been sent to the viewer. Due to the sudden high volume of requests to our admin, we were seeing timeouts and partial outages here as well.

Why it happened

In the week before the incident, we deployed a faulty code change to our internal deployment tooling that would incorrectly configure our infrastructure to listen for our viewer domain on the admin website upon deploying. This tooling was then deployed along with our regular release.

How we fixed it

After discovering the cause of this, we manually fixed the redirects, and we saw load drop on our admin and our viewer, returning the platform to normal operation. We then made the necessary code changes, and re-deployed all applications, returning us to a fully functional state.

What we are doing to prevent it from happening again

We are documenting the steps that lead up to this very uncommon error, and extra steps are added to catch it in the future, as well as being added to our incident response team's checklist.

We sincerely apologize for this unscheduled downtime and how it has affected you and your customers. Thank you so much for your understanding and patience, while we resolved this challenge 🙏

Posted Feb 08, 2023 - 13:16 CET

Resolved

This incident has now been fully resolved and all systems are operational and fully stable.
We appreciate your understanding and patience while our team worked quickly to resolved this🙏
Posted Feb 06, 2023 - 15:36 CET

Monitoring

We have identified the cause and deployed a fix and systems are returning to normal.
We will monitor the situation before closing this issue, we should be fully operational again.
Posted Feb 06, 2023 - 14:14 CET

Investigating

We are currently seeing very slow load times of our products, and this is causing timeouts for several products.
We are working intensely investigating the cause of this and will update as we learn more.
Posted Feb 06, 2023 - 12:56 CET
This incident affected: Flipbooks (Viewer, Admin, Backend API, Display Viewer, Pop-ups).