Ah, the bliss and eager joy when we can operate technology seamlessly and productively, making effective progress rather than mistakes; where the technology helps us make better, informed decisions. Sadly this seldom happens in operations – when administering the software – or trying to address problems.
HCI for systems software and tools
Testing the operating procedures, the tools and utilities to configure, administer, safeguard, diagnose and recover, etc. may be some of the most important testing we do. The context, including emotional & physical aspects, are important considerations and may make the difference between performing the desired activity versus exacerbating problems, confusion, etc. For instance, is the user tired, distracted, working remotely, under stress? each of these may increase the risk of more and larger mistakes.
Usability testing can help us consider and design aspects of the testing. For instance, how well do the systems software and tools enable people to complete tasks effectively, efficiently and satisfactorily?
Standard Operating Procedures
Standard Operating Procedures (SOP’s) can help people and organisations to deliver better outcomes with fewer errors. For a recent assignment testing Kafka, testing needed to include testing the suitability of the SOP’s, for instance to determine the chances of someone making an inadvertent mistake that caused a system failure or compounded an existing challenge or problem.
Testing Recovery is also relevant. There may be many forms of recovery. In terms of SOPs we want and expect most scenarios to be included in the SOPs and to be trustworthy. Recovery may be for a particular user or organisation (people / business centred) and/or technology centred e.g. recovering at-risk machine instances in a cluster of servers.
OpsDev & DevOps
OpsDev and DevOps may help improve the understanding and communication between development and operations roles and foci. They aren’t sufficient by themselves.