I’ve recently been evaluating some of the automated tests for one of the projects I help, the Kiwix Android app. We have a moderate loose collection of automated tests written using Android’s Espresso framework. The tests that interact with the external environment are prone to problems and failures for various reasons. We need these tests to be trustworthy in order to run them in the CI environment across a wider range of devices. For now we can’t as these tests fail just over half the time. (Details are available in one of the issues being tracked by the project team: https://github.com/kiwix/kiwix-android/issues/283.)

The most common failure is in DownloadTest, followed by NetworkTest. From reading the code we have a mix of silent continuations (where the test proceeds regardless of errors) and implicit expectations (of what’s on the server and the local device), these may well be major contributors to the failures of the tests. Furthermore, when a test fails the error message tells us which line of code the test failed on but don’t help us understand the situation which caused the test to fail. At best we know an expectation wasn’t met at run-time, (i.e. an assertion in the code).

Meanwhile I’ve been exploring how Espresso is intended to be used and how much information it can provide about the state of the app via the app’s GUI. It seems that the intended use is for it to keep information private where it checks on behalf of the running test whether an assertion holds true, or not. However, perhaps we can encourage it to be more forthcoming and share information about what the GUI comprises and contains?

I’ll use these two tests (DownloadTest and NetworkTest) as worked examples where I’ll try to find ways to make these tests more robust and also more informative about the state of the server, the device, and the app.

Situations I’d like the tests to cope with:

One or more of the ZIM files are already on the local device: we don’t need to assume the device doesn’t have these files locally.
We can download any small ZIM file, not necessarily a predetermined ‘canned’ one.

Examples of information I’d like to ascertain:

How many files are already on the device, and details of these files
Details of ZIM files available from the server, including the filename and size

Possible approaches to interacting with Espresso

I’m going to assume you either know how Espresso works or be willing to learn about it – perhaps by writing some automated tests using it? 🙂 A good place to start is the Android Testing Codelab, freely available online.

Perhaps we could experiment with a question or query interface where the automated test can ask questions and elicit responses from Espresso. Something akin to the Socratic Method? This isn’t intended to replace the current way of using Espresso and the Hamcrest Matchers.

Who is the decision maker?

In popular opensource test automation frameworks, including junit and espresso (via hamcrest) the arbiter, or decision maker, is the assertion where the tests passes information to the assertion and it decides whether to allow the test to continue or halt and abort this test. The author of the automated test can choose whether to write extra code to handle any rejection but it still doesn’t know the cause of the rejection. Here’s an example of part of the DownloadTest at the time of writing. The try/catch means the test will continue regardless of whether the click works.

onData(withContent("ray_charles")).inAdapterView(withId(R.id.library_list)).perform(click());
try { onView(withId(android.R.id.button1)).perform(click()); } catch (RuntimeException e) { }

This code snippet exemplifies many espresso tests, where a reader can determine certain details such as the content the test is intended to click on, however there’s little clue what the second click is intended to do from the user’s perspective – what’s the button, what’s the button ‘for’, and why would a click legitimately fail and yet the test be OK to continue?

Sometimes I’d like the test to be able to decide what to do depending on the actual state of the system. What would we like the test to do?

If the Ray Charles content is not available online?
If the Ray Charles content is already on the device?

For a download test, perhaps another file would be as useful?

Increasing robustness of the tests

For me, a download test should focus on being able to test the download of a representative file and be able to do so even if the expected file is already on the local device. We can decide what it’d like to do in various circumstances e.g. perhaps it could simply delete the local instance of a test file such as the one for Ray Charles? The ‘cost’ of re-downloading this file is tiny (at least compared to Wikipedia in English) if the user wants to have this file on the device. Or conversely perhaps the test could leave the file on the device once it’s downloaded it if the file was there before it started – a sort-of refresh of the content… (I’m aware there are potential side-effects if the contents have been changed; or if the download fails.)

Would we like the automated test to retry a download if the download fails? if so, how often? and should the tests report failed downloads anywhere? I’ll cover logging shortly.

More purposeful tests

Tests often serve multiple purposes, such as:

Confidence: Having large volumes of tests ‘passing’ may provide confidence to project teams.
Feedback: Automated tests can provide fast, almost immediate, feedback on changes to the codebase. They can also be run on additional devices, configurations (e.g. locales), etc. to provide extra feedback about aspects of the app’s behaviours in these circumstances.
Information: tests can gather and present information such as response times, installation time, collecting screenshots (useful for updating them in the app store blurb), etc.
Early ‘warning’: for instance, of things that might go awry soon if volumes increase, conditions worsen, etc.
Diagnostics: tests can help us compare behaviours e.g. not only where, when, etc. does something fail? but also where, when, etc. does it work? The comparisons can help establish boundaries and equivalence partitions to help us hone in on problems, find patterns, and so on.

Test runners (e.g. junit) don’t encourage logging information especially if the test completes ‘successfully’ (i.e. without unhandled exceptions or assertion failures). Logging is often used in the application code, it can also be used by the tests. As a good example, Espresso logs all interactions automatically to the Android log which may help us (assuming we read the logs and pay attention to their contents) to diagnose aspects of how the tests are performing.

Next Steps

This blog post is a snapshot of where I’ve got to. I’ll publish updates as I learn and discover more.

Better Software Testing Blog

Seeking ways to improve the efficiency and effectiveness of our craft

Seeking more robust and purposeful automated tests

Possible approaches to interacting with Espresso

Who is the decision maker?

Increasing robustness of the tests

More purposeful tests

Next Steps

Further reading

One thought on “Seeking more robust and purposeful automated tests”