Are there blind spots in your service assurance approach?
Netflix, a provider of online streaming media, made news over the holidays when customers experienced a service outage on Christmas Eve. Imagine taking the wrapping off of your new mobile device and deciding to try it out to stream a movie. For those located in North America, you probably found that the Netflix movie streaming service was down.
This outage was caused by issues within Amazon Web Services that Netflix employs to support movie streaming. Initially, the Amazon support team pursued API errors before learning that the root cause of the outage was actually a configuration issue caused by human error. This misstep ultimately delayed the restoration of service to Netflix customers. Over the course of that day, the configuration error first manifested itself as performance degradation, and then cascaded to a full service outage for many customers. One way of avoiding a situation like this one could have been to take a more system-wide approach to service assurance. Continue reading




