fledgling heuristics for testing android apps

I’ve been inspired to have a go at creating some guidelines for testing of Android apps. The initial request was to help shape interviews to be able to identify people who’re understand some of the challenges and approaches for testing on Android and for Android apps. I hope these will serve actual testing of the apps too.

  • Android Releases and API Versions
  • OnRotation and other Configuration Changes
  • Fundamental Android concepts including: Activities, Services, Intents, Broadcast Receivers, and Content Providers  https://developer.android.com/guide/components/fundamentals
  • Accessibility settings
  • Applying App Store data and statistics
  • Crashes and ANR‘s
  • Using in-app Analytics to compare our testing and how the app is used by the Users
  • Logs and Screenshots
  • SDK tools, including Logcat, adb, and monitor
  • Devices, including sensors, resources, and CPUs
  • Device Farms, sources of devices available to rent remote devices (often in the ‘cloud’)
  • Permissions granted & denied
  • Alpha & Beta channels
  • Build Targets (Debug, Release & others)
  • Test Automation Frameworks and Monkey Testing

I’ll continue exploring ideas and topics to include. Perhaps a memorable heuristic phrase will emerge, suggestions welcome on twitter https://twitter.com/julianharty

Working with pepper-box to test Kafka


We needed to do performance testing of multi-regional Kafka Clusters, We ended up using pepper-box for most of our work. We had to first understand, then use, then extend and enhance the capabilities of pepper-box during the assignment. Here is an overview of what we did in terms of working with pepper-box. As we published our code and other related materials on github you can see more of the details at that site: https://github.com/commercetest/pepper-box

Pepper-box and jmeter

In order to use pepper-box we first needed to understand the fundamentals of jmeter.  Using jmeter to test a non-web protocol i.e. Kafka ended up taking significant effort and time where we ended up spending lots of time having to learn about various aspects of compiling the code, configuring jmeter, and running the tests.

Thankfully as there were both kafkameter and pepper-box we were able to learn lots from various articles as well as the source code. Key articles include:

The blazemeter article even included an example consumer script written in a programming language called Groovy. We ended up extending this script significantly and making it available as part of our fork of pepper-box (since it didn’t seem sensible to create a separate project for this script) https://github.com/commercetest/pepper-box/blob/master/src/groovyscripts/kafka-consumer-timestamp.groovy 

As ever there was lots of other reading and experimentation to be able to reliably and consistently develop the jmeter plugins. Lowlights included needing to convince both the machine and maven that we actually needed to use Java 8.

Extending pepper-box to support security protocols

Business requirements mandated data would be secured throughout the system. There are various security mechanisms supported by Kafka. SASL enabled nodes to authenticate themselves to Kafka instances. Connections were secured using what’s known as SSL (e.g. see http://info.ssl.com/article.aspx?id=10241) however the security is provided by a successor called TLS (see https://docs.confluent.io/current/kafka/encryption.html).

A key facet of the work was adding support to both the producer and consumer code to enable it to be used with clusters configured with and without security, in particular SASL_SSL. The code is relatively easy to write but debugging issues with it was very time-consuming especially as we had to test in a variety of environments each with different configurations where none of the team had prior experience of how to configure Kafka with SASL_SSL before the project started.

We ran into multiple issues related to the environments and getting the Kafka clusters to stay healthy and the replication to happen without major delays. I may be able to cover some of the details in subsequent articles. We also realised that using the pepper-box java sampler (as they’re called in jmeter terminology) used lots of CPU and we needed to start running load generators and consumers in parallel.

Standalone pepper-box

We eventually discovered the combination of jmeter and the pepper-box sampler was maxing out and unable to generate the loads we wanted to create to test various aspects of the performance and latency. Thankfully the original creators of pepper-box had provided a standalone load generation utility which was able to generate significantly higher loads. We had to tradeoff between the extra performance and the various capabilities of jmeter and the many plugins that have been developed for jmeter over the years. We’d have to manage synchronisation of load generators on multiple machines ourselves, and so on.

The next challenge was to decide whether to develop an equivalent standalone consumer ourselves. In the end we did, partly as jmeter had lost credibility with the client so it wasn’t viable to continue using the current consumer.

Developing a pepper-box consumer

The jmeter-groovy-consumer wasn’t maxing out, however it made little sense to run dissimilar approaches (a standalone producer written in Java combined with jmeter + Groovy) and added to the overall complexity and complications for more involved tests. Therefore we decided to create a consumer modelled on the standalone producer. We didn’t end up adding rate-limiting as it didn’t really suit the testing we were doing, otherwise they’re fairly complementary tools. The producer sends a known message format which is parsed by the consumer that calculates latency and writes the output in a csv file per topic. The producer polls for messages using default values (e.g. 500 messages limit per poll request). These could be easily adapted with further tweaks and improvements to the code.

Using polling leads to a couple of key effects:

  1. It uses less CPU. Potentially several Consumers can be run on the same machine to process messages from several Producers (Generators) running across a bank of machines.
  2. The granularity of the timing calculations are constrained by the polling interval.

Summary of the standalone pepper-box tools

Both the producer and consumer are more functional than elegant and neither very forgiving of errors, missing or incorrect parameters. It’d be great to improve their usability at some point.

Seeking more robust and purposeful automated tests

I’ve recently been evaluating some of the automated tests for one of the projects I help, the Kiwix Android app. We have a moderate loose collection of automated tests written using Android’s Espresso framework. The tests that interact with the external environment are prone to problems and failures for various reasons. We need these tests to be trustworthy in order to run them in the CI environment across a wider range of devices. For now we can’t as these tests fail just over half the time. (Details are available in one of the issues being tracked by the project team: https://github.com/kiwix/kiwix-android/issues/283.)

The most common failure is in DownloadTest, followed by NetworkTest. From reading the code we have a mix of silent continuations (where the test proceeds regardless of errors) and implicit expectations (of what’s on the server and the local device), these may well be major contributors to the failures of the tests. Furthermore, when a test fails the error message tells us which line of code the test failed on but don’t help us understand the situation which caused the test to fail. At best we know an expectation wasn’t met at run-time, (i.e. an assertion in the code).

Meanwhile I’ve been exploring how Espresso is intended to be used and how much information it can provide about the state of the app via the app’s GUI. It seems that the intended use is for it to keep information private where it checks on behalf of the running test whether an assertion holds true, or not. However, perhaps we can encourage it to be more forthcoming and share information about what the GUI comprises and contains?

I’ll use these two tests (DownloadTest and NetworkTest) as worked examples where I’ll try to find ways to make these tests more robust and also more informative about the state of the server, the device, and the app.

Situations I’d like the tests to cope with:

  • One or more of the ZIM files are already on the local device: we don’t need to assume the device doesn’t have these files locally.
  • We can download any small ZIM file, not necessarily a predetermined ‘canned’ one.

Examples of information I’d like to ascertain:

  • How many files are already on the device, and details of these files
  • Details of ZIM files available from the server, including the filename and size

Possible approaches to interacting with Espresso

I’m going to assume you either know how Espresso works or be willing to learn about it – perhaps by writing some automated tests using it? 🙂 A good place to start is the Android Testing Codelab, freely available online.

Perhaps we could experiment with a question or query interface where the automated test can ask questions and elicit responses from Espresso. Something akin to the Socratic Method? This isn’t intended to replace the current way of using Espresso and the Hamcrest Matchers.

Who is the decision maker?

In popular opensource test automation frameworks, including junit and espresso (via hamcrest) the arbiter, or decision maker, is the assertion where the tests passes information to the assertion and it decides whether to allow the test to continue or halt and abort this test. The author of the automated test can choose whether to write extra code to handle any rejection but it still doesn’t know the cause of the rejection. Here’s an example of part of the DownloadTest at the time of writing. The try/catch means the test will continue regardless of whether the click works.


try {
} catch (RuntimeException e) {

This code snippet exemplifies many espresso tests, where a reader can determine certain details such as the content the test is intended to click on, however there’s little clue what the second click is intended to do from the user’s perspective – what’s the button, what’s the button ‘for’, and why would a click legitimately fail and yet the test be OK to continue?

Sometimes I’d like the test to be able to decide what to do depending on the actual state of the system. What would we like the test to do?

For a download test, perhaps another file would be as useful?

Increasing robustness of the tests

For me, a download test should focus on being able to test the download of a representative file and be able to do so even if the expected file is already on the local device. We can decide what it’d like to do in various circumstances e.g. perhaps it could simply delete the local instance of a test file such as the one for Ray Charles? The ‘cost’ of re-downloading this file is tiny (at least compared to Wikipedia in English) if the user wants to have this file on the device. Or conversely perhaps the test could leave the file on the device once it’s downloaded it if the file was there before it started  – a sort-of refresh of the content… (I’m aware there are potential side-effects if the contents have been changed; or if the download fails.)

Would we like the automated test to retry a download if the download fails? if so, how often? and should the tests report failed downloads anywhere? I’ll cover logging shortly.

More purposeful tests

Tests often serve multiple purposes, such as:

  • Confidence: Having large volumes of tests ‘passing’ may provide confidence to project teams.
  • Feedback: Automated tests can provide fast, almost immediate, feedback on changes to the codebase. They can also be run on additional devices, configurations (e.g. locales), etc. to provide extra feedback about aspects of the app’s behaviours in these circumstances.
  • Information: tests can gather and present information such as response times, installation time, collecting screenshots (useful for updating them in the app store blurb), etc.
  • Early ‘warning’: for instance, of things that might go awry soon if volumes increase, conditions worsen, etc.
  • Diagnostics: tests can help us compare behaviours e.g. not only where, when, etc. does something fail? but also where, when, etc. does it work? The comparisons can help establish boundaries and equivalence partitions to help us hone in on problems, find patterns, and so on.

Test runners (e.g. junit) don’t encourage logging information especially if the test completes ‘successfully’ (i.e. without unhandled exceptions or assertion failures). Logging is often used in the application code, it can also be used by the tests. As a good example, Espresso logs all interactions automatically to the Android log which may help us (assuming we read the logs and pay attention to their contents) to diagnose aspects of how the tests are performing.


Next Steps

This blog post is a snapshot of where I’ve got to. I’ll publish updates as I learn and discover more.

Further reading

My presentations at the Agile India 2017 Conference

I had an excellent week at the Agile India 2017 conference ably hosted by Naresh Jain. During the week I led a 1-day workshop on software engineering for mobile apps. This included various discussions and a code walkthrough so I’m not going to publish the slides I used out of context. Thankfully much of the content also applied to my 2 talks at the conference, where I’m happy to share the slides. The talks were also videoed, and these should be available at some point (I’ll try to remember to update this post with the relevant links when they are).

Here are the links to my presentations

Improving Mobile Apps using an Analytics Feedback Approach (09 Mar 2017)a

Julian Harty Does Software Testing need to be this way (10 Mar 2017)

Mobile Testers Guide to the Galaxy slides presented at the Dutch Testing Day

I gave the opening keynote at the Dutch Testing Day conference in Groningen, NL. Here are the slides Don’t Panic Mobile Testers Guide to the Galaxy (21 Nov 2013) compressed As you may infer from the filename I compressed the contents to reduce the size of the download for you.

These slides are an updated set from the material I presented at SQuAD in September 2013.

Free continuous builds to run your automated Android Selenium WebDriver tests

Last week I helped with various workshops for the testingmachine.eu project. The project has implemented virtual machine technology to enable automated web tests to run on various operating systems more easily, without needing physical machines for each platform.

One of the friction points with test automation is the ease of deployment and execution of automated tests each time the codebase is updated. So I decided to try using github and travis-ci to see if we could automatically deploy and run automated tests written using Selenium WebDriver that used Android as the host for the automated tests. If we could achieve this, potentially we’d reduce the friction and the amount of lore people would need to know in order to get their tests to run. I’d some experience of building Android code using travis-ci which provided a good base to work from, since building Android code on travis-ci (and on continuous builds generally) can be fiddly and brittle to changes in the SDK, etc.

From the outset we decided to implement our project in small discrete, traceable steps. The many micro-commits to the codebase are intended to make the steps relatively easy to comprehend (and tweak). They’re public at https://github.com/julianharty/android-webdriver-vm-demo/commits/master. We also configured travis-ci to build this project from the outset to enable us to test the continuous build configuration worked and address any blockages early before focusing on customising the build to run the specific additional steps for Android Selenium WebDriver.

We used git subtree (and optional addition to git) to integrate the existing sample tests from the testingmachine.eu project whilst allowing that project to retain a distinct identity and to make that project easy to replace with ‘your’ code.

There were some fiddly things we needed to address, for instance the newer Android Driver seems to trigger timeouts for the calling code (the automated tests) and this problem took a while to identify and debug. However, within 24 hours the new example git project was ready and working. https://travis-ci.org/julianharty/android-webdriver-vm-demo

I hope you will be able to take advantage of this work and it’ll enable you to run some automated tests emulating requests from Android phones to your web site.  There’s lots of opportunity to improve the implementation – feel free to fork the project on github and improve it 🙂

Slides from my talk at SFSCon 2013

I gave a brief presentation, in English, at https://www.sfscon.it/program/2013

The topics include:

  • An introduction to software test automation and the Selenium project
  • Examples of how e-Government services differ in various web browsers and where the differences adversely affect some services for the users
  • A summary of pre-conference workshops for the testingmachine.eu project
  • Some suggestions to improve the testing and even the design of e-Government web services
  • Encouragement to get involved in the testingmachine.eu project.

Here are the slides Testing Web Applications (rev 15 Nov 2013) small

Human Testing for Mobile Apps

Automated software tests are topical where they seem to be replacing much of the testing done by humans. Automated tests are faster, provide early feedback and cost little to run many times. Agile projects need automated tests to keep up with the frequent builds which may arrive tens or hundreds of times a day and need testing.

So human testing seems to be gathering cobwebs, even despised as unproductive, low-skilled work done by testers who don’t have the ‘skills’ to write automated tests. However, as an industry we ignore testing by humans at our peril. There’s so much testing that’s beyond practical reach of automated tests. It’s time to revive interactive testing performed by motivated and interested humans. This talk will help you to find a new impetus and focus for your interactive testing to complement automated tests.

Feelings and emotions are what users will judge your apps on, so let’s test and explore how users may feel about the mobile apps. Michael Bolton published an insightful article called: “I’ve Got a Feeling: Emotions in Testing by Michael Bolton”

Fast, efficient testing can augment the repetitive automated testing. BugFests, where a group of people meet to test the same piece of software together for up to an hour can be extremely productive at finding problems the automated tests haven’t.

Another technique is moving both you (from place to place) and the phone (by rotating it from portrait to landscape modes, etc.) may help find and expose bugs which are hard for your automated tests to discover.

I will be giving a keynote at VistaCon 2013 in April 2013 on this topic. Please email me if you would like to get involved in the discussion, share ideas, criticize, etc.

Android Test Automation Getting to grips with UI Automator

Over the last week I have spent about a day of effort getting to grips with the recently launched UIAutomator test automation framework for Android. It was launched with version 16 of Android (Android 4.1) however on 4.1 devices the framework doesn’t even have all the documented methods available. With version 17 of Android (Android 4.2), support has improved to the point that the examples can work acceptably. Here is the official example http://developer.android.com/tools/testing/testing_ui.html

However in the minor update between Android 4.2.1 and Android 4.2.2 someone seems to have broken the support for automatic scrolling through pages of results.  I have reported the problem on the adt-dev forum, https://groups.google.com/forum/?fromgroups=#!topic/adt-dev/TjeewtpNWf8 which seems to be where the Android development team monitor comments. I have implemented a workaround, using a helper method, below:

     * Launches an app by it's name. 
     * @param nameOfAppToLaunch the localized name, an exact match is required to launch it.
    protected static void launchAppCalled(String nameOfAppToLaunch) throws UiObjectNotFoundException {
        UiScrollable appViews = new UiScrollable(new UiSelector().scrollable(true));
          // Set the swiping mode to horizontal (the default is vertical)
          appViews.scrollToBeginning(10);  // Otherwise the Apps may be on a later page of apps.
          int maxSearchSwipes = appViews.getMaxSearchSwipes();

          UiSelector selector;
          selector = new UiSelector().className(android.widget.TextView.class.getName());
          UiObject appToLaunch;
          // The following loop is to workaround a bug in Android 4.2.2 which
          // fails to scroll more than once into view.
          for (int i = 0; i < maxSearchSwipes; i++) {

              try {
                  appToLaunch = appViews.getChildByText(selector, nameOfAppToLaunch);
                  if (appToLaunch != null) {
                      // Create a UiSelector to find the Settings app and simulate      
                      // a user click to launch the app.
              } catch (UiObjectNotFoundException e) {
                  System.out.println("Did not find match for " + e.getLocalizedMessage());

              for (int j = 0; j < i; j++) {
                  System.out.println("scrolling forward 1 page of apps.");

I ended up writing several skeletal demo Android apps to help me explore the capabilities of UI Automator. In each case I was working through publicly reported problems on http://stackoverflow.com where I’ve posted answers and feedback to several reported problems.

Here are the links to my comments:




Strengths of UI Automator

The key strengths include:

  • We can test most applications, including Google’s installed apps such as Settings. Thankfully the example from the Android site does just that, albeit at a perfunctory level. However the example to change the Wi-Fi setting on stackoverflow provides a better example of what we can now do. Because the tests interact with the objects, they have a direct connection to the app being tested, rather than crude interactions by clicking at locations, OCR, etc.
  • Using UI Automator relies on the underlying support for Accessibility in the platform and therefore may help to encourage improved support for Accessible Android apps as developers refine their apps to make them testable by Ui Automator.
  • We can test apps on several devices from one computer, through related changes to the Android build tools.
  • There are debug and exploration tools available on both the device (using adb shell uiautomator) and from my computer, using uiautomationviewer.


  • Text based matching makes testing localized apps much harder than using the older Android Instrumentation which could easily share resource files with the app being tested.
  • There is virtually no documentation or examples, and the documentation that does exist doesn’t provide enough clues to address key challenges e.g. obtaining the text from WebViews.
  • UI Automation cannot be used when the Accessibility features e.g. Explore-By-Touch is enabled on the device.
  • There are bugs in the current version of Android and there’s no easy way to revert devices to 4.2.1
  • Automation is very slow e.g. paging through the set of apps takes several seconds to go to the next page.

Other characteristics

  • All the tests are bundled into a single jar file, deployed to the device. This risks one set of tests overwriting the bundle of tests.

Further reading