How to Save Time Running Automated Tests with Parallel CI Machines
Slow automated tests
Automated tests can be considered slow when programmers stop running the whole test suite on their local machine because it is too time consuming. Most of the time you use CI servers such as Jenkins, CircleCI, Github Actions to run your tests on an external machine instead of your own. When you have a test suite that runs for an hour then it’s not efficient to run it on your computer. Browser end-to-end tests for your web project can take a really long time to execute. Running tests on a CI server for an hour is also not efficient. You as a developer need a fast feedback loop to know if your software works fine. Automated tests should help you with that.
Split tests between many CI machines to save time
A way to save you time is to make CI build as fast as possible. When you have tests taking e.g. 1 hour to run then you could leverage your CI server config and setup parallel jobs (parallel CI machines/nodes). Each of the parallel jobs can run a chunk of the test suite.
You need to divide your tests between parallel CI machines. When you have a 60 minutes test suite you can run 20 parallel jobs where each job runs a small set of tests and this should save you time. In an optimal scenario you would run tests for 3 minutes per job.
How to make sure each job runs for 3 minutes? As a first step you can apply a simple solution. Sort all of your test files alphabetically and divide them by the number of parallel jobs. Each of your test files can have a different execution time depending on how many test cases you have per test file and how complex each test case is. But you can end up with test files divided in a suboptimal way, and this is problematic. The image below illustrates a suboptimal split of tests between parallel CI jobs where one job runs too many tests and ends up being a bottleneck.
Static tests split based on tests time
You can use the execution time of each test file to predict how to divide the test suite between parallel jobs in a way that ensures the time spent by each job is similar.
You could use tools like open source knapsack ruby gem, designed for Ruby and its test runners such as RSpec, Minitest, Cucumber. You can even track all your commits and record time execution of your tests per CI build with knapsack_pro gem to better predict how your next CI build should distribute the tests across parallel CI jobs.
Dividing test files between parallel jobs based on test file execution time has some limitations. For example when your test file has non-deterministic execution time. This can happen when your end-to-end browser test needs to load a complex web page and the browser makes many requests to external API that sometimes have slow response. There are many components needed to fully load the page and this can affect how fast the test case is. Sometimes a web page is loaded faster or slower. This impacts how long it takes to run the test file. There is often a compound effect and your parallel jobs can run much longer than expected 3 minutes which can lead to a bottleneck job running too long and slowing down your CI build.
Another common issue is the start time of each parallel job. Often before you start running tests you need to prepare the environment and for that you need to install dependencies inside of the parallel job, or load data from cache. This can lead to all parallel jobs starting work at different points in time. Consequently, the parallel job starting last could become a bottleneck to the whole CI build.
Dynamic tests split between parallel jobs
In order to address problems with static tests split you can use dynamic tests split. The idea is simple. You have a list of test files in a queue. Then each parallel job takes a few test files from the queue and executes them. When finished, it takes another set of test files from the queue. This continues until the whole queue is consumed. This way, if one of the parallel jobs happens to run slow test cases then it will simply end up consuming fewer test files from the queue. The same would happen with the job that started its work late due to slow environment setup. It will also run fewer test cases.
The graph below shows how dynamic split works with the queue.
Dynamic tests split should result in an optimal test suite split between parallel CI jobs as shown on the below graph.
How to implement dynamic tests split with a queue
jobs: - name: Run Ruby tests with Knapsack Pro parallelism: 10 # run 10 parallel CI nodes commands: # Run RSpec specs in parallel - run: bundle exec knapsack_pro:queue:rspec # Run Minitest tests in parallel - run: bundle exec knapsack_pro:queue:minitest # Run Cucumber tests in parallel - run: bundle exec knapsack_pro:queue:cucumber # ... other Ruby test runners here - name: Run JS tests with Knapsack Pro parallelism: 4 commands: # Run Cypress tests in parallel - $(npm bin)/knapsack-pro-cypress # Run Jest tests in parallel - $(npm bin)/knapsack-pro-jest
[source code can be found at https://gist.github.com/ArturT/580df4fd7852e67379e9b263228e1994#file-ci-server-config-yaml ]
Ruby developers can also find useful a feature dedicated to the RSpec test runner, which automatically detects slow test files and divides them based on individual test cases. This way your slow RSpec test file can be run in parallel jobs as well.
I hope you find this useful and you manage to save time in your daily work by utilizing fast parallel CI builds.