Slow load times and outages across products
Incident Report for iPaper
Postmortem

What happened

On Monday, February 6th, 2023 at 12:26 PM, we released a new version of our software and got notified by our automated checks that we were seeing partial outages across our Flipbooks viewer and our administration. We quickly started investigating the cause as we were seeing a high volume of errors in our logs, and we saw that our primary domain “viewer.ipaper.io” was redirecting all visitors to our admin application. This flooded our admin with requests that should have been sent to the viewer. Due to the sudden high volume of requests to our admin, we were seeing timeouts and partial outages here as well.

Why it happened

In the week before the incident, we deployed a faulty code change to our internal deployment tooling that would incorrectly configure our infrastructure to listen for our viewer domain on the admin website upon deploying. This tooling was then deployed along with our regular release.

How we fixed it

After discovering the cause of this, we manually fixed the redirects, and we saw load drop on our admin and our viewer, returning the platform to normal operation. We then made the necessary code changes, and re-deployed all applications, returning us to a fully functional state.

What we are doing to prevent it from happening again

We are documenting the steps that lead up to this very uncommon error, and extra steps are added to catch it in the future, as well as being added to our incident response team's checklist.

We sincerely apologize for this unscheduled downtime and how it has affected you and your customers. Thank you so much for your understanding and patience, while we resolved this challenge 🙏

Posted Feb 08, 2023 - 13:16 CET

Resolved
This incident has now been fully resolved and all systems are operational and fully stable.
We appreciate your understanding and patience while our team worked quickly to resolved this🙏
Posted Feb 06, 2023 - 15:36 CET
Monitoring
We have identified the cause and deployed a fix and systems are returning to normal.
We will monitor the situation before closing this issue, we should be fully operational again.
Posted Feb 06, 2023 - 14:14 CET
Investigating
We are currently seeing very slow load times of our products, and this is causing timeouts for several products.
We are working intensely investigating the cause of this and will update as we learn more.
Posted Feb 06, 2023 - 12:56 CET
This incident affected: Admin, Flipbooks, Flipbooks Backend API, Display, and Pop-ups.