Last week (week of April 7, 2013), we had two occasions of intermittent server outages on Ordoro. We take these issues seriously and try our best to avoid them. But, just like any software system in the world, sometimes things fall through the cracks. We believe that it is best to talk openly about these outages rather than push them under the rug. Transparency helps our customers stay informed, communicates how seriously we take these issues, and helps everyone understand the work that goes on behind the scenes to keep our systems up and running with minimal problems.
We had two sets of problems last week. The first one happened on Monday morning, and the second one on Friday morning.
Monday, April 8, 2013
We use a third party software called iron.io to run all the Ordoro tasks related to importing orders from cart, writing back tracking numbers, and keep inventory in sync between your channels (Shopify, BigCommerce, Amazon, eBay etc). Starting on April 7, we started seeing intermittent issues on our import-export servers. These issues were not impacting our users because the data transfer tasks were still going through without failure. However, on Monday morning, the issue became serious, and it started impacting the order import for a segment of our customers. The issue was fully resolved by Monday afternoon.
When we first spotted the issue, we immediately got in touch with the folks at iron.io. They told us that they had been noticing issues in their logs and had already mobilized their entire engineering team to investigate and resolve the problem at the earliest. They were constantly in touch with us via their support chat, and via Twitter. The iron.io team resolved this issue by Monday afternoon, and wrote a detailed blog post about this the following day. (In fact, our blog post was inspired by the blog post written by iron.io team)
What we learned
We plan on monitoring the status of our import-export servers more closely in the future, so that we can react to any issues faster, and communicate the issues to our customers in a more timely fashion.
Friday, April 12, 2013
We launched a systemwide update to improve the search functionality in Ordoro. This feature allows our users to find orders, shipments, and products 10x faster than before. Unfortunately, this upgrade affected some users’ ability to import orders. The search process requires real-time indexing as new orders are added, and there was a bug in the indexing process that caused order creation to fail for users with large number of suppliers in Ordoro.
It was an all-hands-on-deck effort to discover the cause of this issue. Once our development team narrowed it down, it took 30 minutes to write the code fix, test it in our development setting, and launch to production.
What we learned
We have added more testing into our development setting to catch errors like this before they are pushed to production. We are also going to update our emergency response plan so that we can respond quicker and with more precision in the future.
We apologize to those users who were affected by these outages, and for any inconvenience this may have caused. Please feel free to reach out if you have any concerns. You can email me (Kristen Tan) directly at email@example.com or talk to me at 512-271-9453 ext. 1.