Working with pepper-box to test Kafka


We needed to do performance testing of multi-regional Kafka Clusters, We ended up using pepper-box for most of our work. We had to first understand, then use, then extend and enhance the capabilities of pepper-box during the assignment. Here is an overview of what we did in terms of working with pepper-box. As we published our code and other related materials on github you can see more of the details at that site:

Pepper-box and jmeter

In order to use pepper-box we first needed to understand the fundamentals of jmeter.  Using jmeter to test a non-web protocol i.e. Kafka ended up taking significant effort and time where we ended up spending lots of time having to learn about various aspects of compiling the code, configuring jmeter, and running the tests.

Thankfully as there were both kafkameter and pepper-box we were able to learn lots from various articles as well as the source code. Key articles include:

The blazemeter article even included an example consumer script written in a programming language called Groovy. We ended up extending this script significantly and making it available as part of our fork of pepper-box (since it didn’t seem sensible to create a separate project for this script) 

As ever there was lots of other reading and experimentation to be able to reliably and consistently develop the jmeter plugins. Lowlights included needing to convince both the machine and maven that we actually needed to use Java 8.

Extending pepper-box to support security protocols

Business requirements mandated data would be secured throughout the system. There are various security mechanisms supported by Kafka. SASL enabled nodes to authenticate themselves to Kafka instances. Connections were secured using what’s known as SSL (e.g. see however the security is provided by a successor called TLS (see

A key facet of the work was adding support to both the producer and consumer code to enable it to be used with clusters configured with and without security, in particular SASL_SSL. The code is relatively easy to write but debugging issues with it was very time-consuming especially as we had to test in a variety of environments each with different configurations where none of the team had prior experience of how to configure Kafka with SASL_SSL before the project started.

We ran into multiple issues related to the environments and getting the Kafka clusters to stay healthy and the replication to happen without major delays. I may be able to cover some of the details in subsequent articles. We also realised that using the pepper-box java sampler (as they’re called in jmeter terminology) used lots of CPU and we needed to start running load generators and consumers in parallel.

Standalone pepper-box

We eventually discovered the combination of jmeter and the pepper-box sampler was maxing out and unable to generate the loads we wanted to create to test various aspects of the performance and latency. Thankfully the original creators of pepper-box had provided a standalone load generation utility which was able to generate significantly higher loads. We had to tradeoff between the extra performance and the various capabilities of jmeter and the many plugins that have been developed for jmeter over the years. We’d have to manage synchronisation of load generators on multiple machines ourselves, and so on.

The next challenge was to decide whether to develop an equivalent standalone consumer ourselves. In the end we did, partly as jmeter had lost credibility with the client so it wasn’t viable to continue using the current consumer.

Developing a pepper-box consumer

The jmeter-groovy-consumer wasn’t maxing out, however it made little sense to run dissimilar approaches (a standalone producer written in Java combined with jmeter + Groovy) and added to the overall complexity and complications for more involved tests. Therefore we decided to create a consumer modelled on the standalone producer. We didn’t end up adding rate-limiting as it didn’t really suit the testing we were doing, otherwise they’re fairly complementary tools. The producer sends a known message format which is parsed by the consumer that calculates latency and writes the output in a csv file per topic. The producer polls for messages using default values (e.g. 500 messages limit per poll request). These could be easily adapted with further tweaks and improvements to the code.

Using polling leads to a couple of key effects:

  1. It uses less CPU. Potentially several Consumers can be run on the same machine to process messages from several Producers (Generators) running across a bank of machines.
  2. The granularity of the timing calculations are constrained by the polling interval.

Summary of the standalone pepper-box tools

Both the producer and consumer are more functional than elegant and neither very forgiving of errors, missing or incorrect parameters. It’d be great to improve their usability at some point.