It describes the challenges – identified via our testing strategy - that we overcame during the migration towards Continuous Delivery. For example, our automated tests just weren’t up to the job. We had good coverage, but performance and reliability issues had to be addressed.
Specifically, in this blog post we’ll be exploring how we renovated our test suites, addressed many of our reliability issues and significantly reduced our release times.
Our primary issue was that we had far too many integration tests and tests going over the network to various services returning live data, making them slow and notoriously unreliable.
We also had a huge suite of selenium UI tests, often compensating for lack of coverage in our unit tests. They’d take an age to run, only to flake out at the last moment due to a change in the live data they were depending on.
It could take hours to successfully run all of our tests in Team City, and it wouldn’t be clear whether tests were failing for the right reasons. Additionally, running tests locally was taking far too long and hindering productivity, especially as the test numbers ballooned over time.
Diagram 1 represents how the majority of our tests were being run - hitting pages using Selenium, which would go over the network to our external services.
We decided to take a new approach. We’d break the tests down in to smaller suites, based on when they’d be run, how fast they are and what they’re testing. We wanted to have good coverage of fast unit tests which could be run locally and regularly, with slower test suites being run automatically in the CI.
In some cases we were able to rewrite these tests as unit tests, asserting that the final view model was fully populated given our mocked responses from the server. But in many cases we needed to assert against the actual page mark up.
In order to reduce the number of slow UI and integration tests, we needed to look at two main areas. Firstly, how we’d fake the external service end points.
We used a mocking framework to fake the service in our unit tests, where we’d cover various edge cases and more isolated logic, but where we were testing the full stack we decided to use proxys which would return fake json based on the requests that were made.
Secondly, for most of our UI tests, where we just wanted to assert for content in the markup, we introduced an integration test framework (based on the excellent MVCIntegrationTestFramework) which would run in memory and spit out the page html for a given url.
We were able to migrate most of our UI tests to our new Integration UI test suite, and although much slower than our unit tests they were quick enough to be run locally more often. In addition, we were able to use our fake http endpoint proxies for other test suites, including our (now much smaller) E2E Selenium tests.
This meant our smoke tests, which were only run on post release (but before sending traffic) to check the important pages were working, were the only tests running on pages populated by real data.
We run the various test suites in parallel on multiple Team City agents for every branch which is pushed. When we do a release, master branch is deployed to production (if tests have all passed), smoke tests are run, and if everything is green we starting sending traffic.
This initial strategy, although constantly evolving, has seen our deployment times reduced to as low as 30 minutes, and frequency increased to several times a day if required.