In the morning of March 29th, 2022 around 7:52 we were alerted that our main applications (Flipbooks - admin/viewer and embeds) were not responding to automated checks after auto-scaling had terminated two older instances. Upon investigating, we found that the new app-servers that were rolled into production automatically, while they were running and responding, had not yet received the application files and were missing crucial web server configuration.
The day prior to this incident, database changes had been pushed to production but were not properly reflected in our deployment; this led to a misconfiguration of our deployment software that caused it to halt before files and web server configuration were properly deployed.
Once we knew the root cause, we updated the deployment software with the missing changes and were able to redeploy to all servers manually. We then introduced new app-servers to test everything was working properly when doing automated scaling, and this was successful as well.
We are documenting the steps that lead up to this very edge-case error, and extra steps are added to catch it in the future.
We sincerely apologize for this unscheduled downtime and how it has affected you and your customers. Thank you so much for your understanding and patience, while we resolved this challenge 🙏