Tools for testing kafka

Context

I realised software tools would help us to test kafka in two key dimensions: performance and robustness. I started from knowing little about the tools or their capabilities although I was aware of jmeter which I’d used briefly over a decade ago.

My initial aim was to find a way we could generate and consume load. This load would then be the background for experimenting with robustness testing to see how well the systems and the data replication would work and cope in inclement conditions. By inclement I mean varying degrees of adverse conditions such as poor network conditions through to compound error conditions where the ‘wrong’ nodes were taken down during an upgrade while the system was trying to catch up from a backlog of transactions, etc. I came up with a concept of the Beaufort Scale of environment conditions which I’m writing about separately.

Robustness testing tools

My research very quickly led me to jepsen the work of Kyle Kingsbury. He has some excellent videos available https://www.youtube.com/watch?v=NsI51Mo6r3o and https://www.youtube.com/watch?v=tpbNTEYE9NQ together with opensource software tools and various articles https://aphyr.com/posts on testing various technologies https://jepsen.io/analyses including Zookeeper and Kafka. Jay Kreps provided his perspective on the testing Kyle did http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen and the effects of Kyle’s work has helped guide various improvements to Kafka.

So far, so good…

However when I started investigating the practical aspects of using jepsen to test newer versions of Kafka I ran into a couple of significant challenges for me at least. I couldn’t find the original scripts, and the ones I found https://github.com/gator1/jepsen/tree/master/kafka were in an Clojure, an unfamiliar language, and for an older version of Kafka (0.10.2.0). More importantly it relied on docker. While docker is an incredibly powerful and fast tool the client wasn’t willing to trust tests run in docker environments (also our environment needed at least 20 instances to test the smallest configuration.

The next project to try was https://github.com/mbsimonovic/jepsen-python a language I and other team members knew sufficiently to try. However we again ran into the blocker that it used docker. However, there seemed to be some potential to test clusters if we could get one of the dependencies to support docker swarm. That project is Blockade. I asked what it would take to add support for docker swarm; quite a lot according to one of the project team https://github.com/worstcase/blockade/issues/67.

By this point we needed to move on and focus on testing the performance and scalability and latency of Kafka including inter-regional data replication so we had to pause our search for automated tools or frameworks to control the robustness aspects of the testing.

Performance testing tools

For now I’ll try to keep focused on the tools rather than trying to define performance testing vs load testing, etc. as there are many disagreements on what distinguishes these terms and other related terms. We knew we needed ways to generate traffic patterns that ranged from simple text to ones that closely resembled the expected production traffic profiles.

Over the period of the project we discovered at least 5 candidates for generating load:

  1. kafkameter https://github.com/BrightTag/kafkameter
  2. pepper-box: https://github.com/GSLabDev/pepper-box
  3. kafka tools: https://github.com/apache/kafka
  4. sangrenel: https://github.com/jamiealquiza/sangrenel
  5. ducktape: https://github.com/confluentinc/ducktape

Both kafkameter and pepper-box integrated with jmeter. Of these pepper-box was newer, and inspired by kafkameter. Also kafkameter would clearly need significant work to suit our needs so we started experimenting with pepper-box. We soon forked the project so we could easily experiment and add functionality without disturbing the parent project or delaying the testing we needed to do by waiting for approvals, etc.

I’ve moved the details of working with pepper-box to a separate blog post http://blog.bettersoftwaretesting.com/2018/04/working-with-pepper-box-to-test-kafka/

Things I wish we’d had more time for

There’s so much that could be done to improve the tooling and approach to using testing tools to test Kafka. We needed to keep an immediate focus during the assignment in order to provide feedback and results quickly. Furthermore, the testing ran into weeds particularly during the early stages and configuring Kafka correctly was extremely time-consuming as newbies to several of the technologies. We really wanted to run many more tests earlier in the project and establish regular, reliable and trustworthy test results.

There’s plenty of scope to improve both pepper-box and the analysis tools. Some of these have been identified on the respective github repositories.

What I’d explore next

The biggest immediate improvement, at least for me, would be to focus on using trustworthy statistical analysis tools such as R so we can automate more of the processing and graphing aspects of the testing.

Further topics

Here are topics I’d like to cover in future articles:

  • Managing and scaling the testing, particularly how to run many tests and keep track of the results, while keeping the environments healthy and clean.
  • Designing the producers and consumers so we could measure throughput and latency using the results collected by the consumer code.
  • Tradeoffs between fidelity of messages and the overhead of increased fidelity (hi-fi costs more than lo-fi at runtime).
  • Some of the many headwinds we faced in establishing trustworthy, reliable test environments and simply performing the testing