The challenges of transforming software development to the Cloud Native paradigm can be examined through the lens of testing.
Our Netflix series demonstrates the radical nature of this transformation. To date enterprise architecture has been a domain based on a fundamental precept of a mostly static environment, a fixed set of applications with relatively infrequent changes made to it, maintained in one or more strictly controlled data centres.
In contrast Netflix now operates a global infrastructure spanning multiple AWS zones executing thousands of inter-operating microservices, continually spawning new ones and auto-scaling Cloud infrastructure; managing it is a process of Mastering Chaos.
As they repeatedly describe testing is fundamental to this, integrating the lessons they’ve learned into best practices applied automatically in their Continuous Deployment life-cycle of new code through their use of Spinnaker, such as canary analysis and staged deployments.
What is especially notable is they don’t just apply testing to the process of writing and deploying new software, but also they rigorously test the whole system.
In the Mastering Chaos presentation Josh describes how they use techniques like Failure Injection Testing to simulate the failing of microservices. In the Microservices at Netflix Scale presentation, at 36:00 Ruslan demonstrates how they tested the failure of entire AWS region.
Netflix have termed this approach as ‘chaos engineering’ – In short they assume the system will fail and proactively test and simulate for this happening. At 11:00 Ruslan describes how they apply these principles in action, such as their use of Chaos Monkey for automating failure testing.
In other words Netflix applies testing from top to bottom, start to finish, of their entire environment including but not limited to their software development life-cycle. Given the principle of ‘infrastructure as code’ they know that failures can occur at any point within the overall environment not just the code they write.
Tools and methods
For organizations seeking to emulate this transformation a number of methods and tools can be considered, including of course the components Netflix have open sourced. Cloud guru David Linthicum makes the point that Cloud Native efforts won’t succeed without a suitable test automation capability like this.
The Cloud Native QA guide repeats the Netflix philosophy, notably “It is important to not only design for failure but test for recovery”. They recommend a series of QA practices for adopting the same type of culture as Netflix, various ways to test for failure and recovery as they do, and using tools such as OpenTracing.
Fernando Mayo explores the modernization of testing for this new microservices world, highlighting practices like property-based testing, fuzz testing, and mutation testing, that can help detect a wider range of defects in an automated way.
On Linkedin Shachar Landshut also proposes a framework for testing microservices, escalating up from testing individual services through integration testing and ultimately the chaos engineering approach that Netflix utilize.
Vendor Profile: Stackify
Stackify was founded in 2012 with the goal to create an easy to use set of tools for developers to improve their applications. Their Retrace product helps improve application quality and performance at every stage of development, through code profiling, log management, error tracking and monitoring.
Ultimately the purpose and benefit in building this capability is to accelerate the throughput rates of new digital innovations, enhancing the capacity of the organization for Digital Transformation.
On DZone Cynthia Dunlop emphasizes the critical insight central to this article – That traditional enterprise software testing methodology is no longer adequate, it’s too slow and change-limiting to facilitate the rapid digital innovation businesses now need to aspire to:
“This brings us to an inflection point: given the increased cadence and complexity of software delivery that the business now demands, traditional testing is not capable of de-risking (e.g., thoroughly testing) every release candidate. As the latest test automation research recommends, we must reinvent testing…and soon. It’s not simply a matter of more tools or different tools. Reinventing testing is a deeper transformation involving people, process, and technologies.”
CIO.com repeats this warning and the need to modernize the software testing function as a strategic enabler of Digital Transformation.
“The bottom line is that if you don’t treat testing as a strategic initiative that’s imperative to your digital success, your lunch is going to get eaten by your competitors.”
DevOps.com even reported that enterprises like Merck experienced an exodus of valuable developers specifically due to them sticking with traditional legacy software testing approaches.
As ever technology alone isn’t the answer, people and organizational change is critical too.
The most significant transformation that Netflix describe is to move away from centralizing functions like QA and testing. In the Microservices at Netflix Scale presentation Ruslan explains that the topmost priority for Netflix, over reliability and efficiency, is digital innovation, and they identified these organizational stage gates as choke points that directly limited this ambition.
So hand in hand with transitioning to a microservices architecture they also devolved QA testing to each individual service team, empowering them to take full life-cycle ownership of the services they created and managed.
Enterprise businesses serious about their ambition to master Digital Transformation face a challenging scope of change, their departments, tools and procedures for software development are long embedded and entrenched. However they now live in a world where companies like Netflix are on the other side of this journey, they’ve spent over seven years now undergoing the change.
The best practices for emulating this approach are now well understood and so some if not many of their competitors will also now be underway with their transformations, elevating them to a heightened level of competitive advantage that will be very difficult to catch.
As they are well understood it does mean however that they too can harness them to be the one doing the eating, not the one being eaten.