Trinity Testing

[reposted from my old Blogger account]

In 2009 I needed to find ways to significantly improve my testing of an major, unfamiliar, product that used to have two long-term full time test engineers assigned to it. I had a few days a month, so there was no way I could reproduce their testing. I realized that every person on the project had gaps in their knowledge and understanding e.g. the developers often didn’t understand much of the business aspects, etc. which meant that features were slow to release, hard to develop and hard to test, and when they finally reached production they often had flaws and limitations.

Testing of features was performed by test engineers and the business teams, and typically lagged the development by several weeks, so when bugs were (finally) reported, the developer needed time to remember their work. The overhead of remembering their work slowed down the fixes and in practice meant that lower-priority issues might be deferred a month or two, rather than being fixed in the release being tested.

Trinity Testing was my approach to addressing all the issues I’ve mentioned so far. It works by combining three people:

  • The developer of a feature or fix
  • The business, feature, or story owner i.e. the domain expert for the feature/fix
  • The test engineer

The people act as peers in the discussion, no one is ‘in charge’ or the ‘decision maker’ instead each participant is responsible for their commitments and actions.

We initially met for no more than 1 hour per developer after a release was created. We shared a computer and screen and ‘walked through’ each feature or significant change, spending a few minutes per item. Generally the developer was responsible for the walk-through; they described how their code works and received comments and questions from the other 2 participants e.g. the domain expert asked about how the feature behaved for other types of account; and the test engineer asked about how they could test the new feature. People noted any follow-up work or actions e.g. the developer may need to revise their implementation based on what they’ve learnt during the session.

At the end of each session, each participant follows-up on their work e.g. the test engineer may target additional testing on areas that are of concern (to any of the participants).

Within 2 releases, the Trinity Testing sessions had proved their value. Everyone who participated found them useful and better than the traditional development and testing process. Furthermore I was able to test each release in about 2 to 3 days, which reduced the manual testing effort to about 1/10th of the original.

Trinity Testing sessions are ideal at a couple of stages in the lifecycle of a feature or fix:

  • At the outset, when the design is being considered
  • As soon as practical after the feature is ‘code complete’, preferably before the formal release candidate is created and while the developer knows the software intimately

At design time, a Trinity Testing session should:

  • help devise the tests that will be needed to confirm the feature will work as desired
  • help the tester to know what to look for to spot problems (how would I know the software is not working?)
  • help the developer to know what the feature/fix needs to do; so they don’t need to guess as often
  • give the ‘owner’ justified confidence that their feature/fix will be more correct, and available sooner

A year on I’m continuing to receive positive comments about how useful Trinity Testing was for the project.

Note: Janet Gregory and Lisa Crispin devised the ‘power of three’ testing several years before I ‘discovered’ Trinity Testing. I wasn’t aware of their work at the time. You might be interested in reading their work as our approaches are similar but not identical. Their work is available in their Agile Testing book http://www.agiletester.ca/.

Man and Machine in Perfect Harmony?

Ah, the bliss and eager joy when we can operate technology seamlessly and productively, making effective progress rather than mistakes; where the technology helps us make better, informed decisions. Sadly this seldom happens in operations – when administering the software – or trying to address problems.

HCI for systems software and tools

Testing the operating procedures, the tools and utilities to configure, administer, safeguard, diagnose and recover, etc. may be some of the most important testing we do. The context, including emotional & physical aspects, are important considerations and may make the difference between performing the desired activity versus exacerbating problems, confusion, etc. For instance, is the user tired, distracted, working remotely, under stress? each of these may increase the risk of more and larger mistakes.

Usability testing can help us consider and design aspects of the testing. For instance, how well do the systems software and tools enable people to complete tasks effectively, efficiently and satisfactorily?

Standard Operating Procedures

Standard Operating Procedures (SOP’s) can help people and organisations to deliver better outcomes with fewer errors. For a recent assignment testing Kafka, testing needed to include testing the suitability of the SOP’s, for instance to determine the chances of someone making an inadvertent mistake that caused a system failure or compounded an existing challenge or problem.

Testing Recovery is also relevant. There may be many forms of recovery. In terms of SOPs we want and expect most scenarios to be included in the SOPs and to be trustworthy. Recovery may be for a particular user or organisation (people / business centred) and/or technology centred e.g. recovering at-risk machine instances in a cluster of servers.

OpsDev & DevOps

OpsDev and DevOps may help improve the understanding and communication between development and operations roles and foci. They aren’t sufficient by themselves.

Further reading

Disposable test environments

Disposable:

  • “readily available for the owner’s use as required”
  • “intended to be thrown away after use”

https://en.oxforddictionaries.com/definition/disposable

For many years test environments were hard to obtain, maintain, and update. Technologies including Virtual Machines and Containers reduce the effort, cost, and potentially the resources needed to provide test environments as and when required. Picking the most appropriate for a particular testing need is key.

For a recent project, to test Kafka, we needed a range of test environments, from lightweight ephemeral self-contained environments to those that involved 10’s of machines distributed at least 100 km apart. Most of our testing for Kafka used AWS to provide the computer instances and connectivity where environments were useful for days to weeks. However we also used ESXi and Docker images. We used ESXi when we wanted to test on particular hardware and networks. Docker, conversely, enabled extremely lightweight experiments, for instance to experiment with self-contained Kafka nodes where the focus was on interactive learning rather than longer-lived evaluations.

Some, not all, of the contents of a test environment has a life beyond that of the environment.  Test scripts, the ability to reproduce the test data and context, key results and lab notes tend to be worth preserving.

Key Considerations

  • what to keep and what to discard: We want ways to identify what’s important to preserve and what can be usefully and hygienically purged and freed.
  • timings: how soon do we need the environment, what’s the duration, and when do we expect it to be life-expired?
  • fidelity: how faithfully and completely does the test environment need to be?
  • count: how many nodes are needed?
  • tool support: do the tools work effectively in the proposed runtime environment?

Further reading