Damn you auto-create

Inspired by the entertaining http://www.damnyouautocorrect.com/ web site, here are some thoughts on the benefits and challenges of having auto-create enabled for Kafka topics https://kafka.apache.org/documentation/#brokerconfigs

`auto.create.topics.enable`

At first, auto-create seems like a convenience, a blessing, as it means developers don’t need to write code to explicitly create topics. For a particular project the developers can focus on using the system as a service to share user-specified sets of data rather than writing extra code to interact with Zookeeper, etc. (newer releases of Kafka include the AdminClient API which deals with the Zookeeper aspects).

Effects of relying on auto-create: topics are created with the default (configured) partition and replication-counts. These may not be ideal for this topic and its intended use(s).

Adverse impacts of using auto-create

Deleting topics: The project uses Confluent Replicator to replicate data from Kafka Cluster to Kafka Cluster. As part of our testing lots of topics were created. We wanted to delete some of these topics but discovered they were virtually impossible to kill as the combination of Confluent Replicator and the Kafka Clusters were resurrecting the topics before they could be fully expunged. This caused almost endless frustration and adversely affected our testing as we couldn’t get the environment sufficiently clean to run tests in controlled circumstances (Replicator was busy servicing the defunct topics which limits it’s ability to focus on the topics we wanted to replicate in particular tests).

Coping with delays and problems creating topics: At a less complex level, auto-creation takes a while to complete and seems to happen in the background. When the tests (and the application software) tries to write to the topic immediately various problems occurred from time to time. Knowing that problems can occur is useful in terms of performance, reliability, etc. however it complicates the operational aspects of the system, especially as the errors affect producing data (what the developers and users think is happening) rather than the orthogonal aspect of creating a topic so that data can be produced.

Lack of clarity or traceability on who (what) created topics: Topics could be auto-created when code tried to write (produce) which was more-or-less what we expected. However they could also be auto-created by trying to read (consume). The Replicator duly setup replication for that topic. For various reasons topics could be created on one or more clusters with the same name; and replication happened both locally (within a Kafka Cluster) and to another cluster.  We ended up with a mess of topics on various clusters which was compounded by the challenges cleaning up (deleting) the various topics. It ended up feeling like we were living through the after-effects of the Sorcerer’s Apprentice!

From a testing perspective

From a testing perspective we ended up adding code in our consumer code that checked and waited for the topic to appear in Zookeeper before trying to read from it. This, at least, reduced some of the confusion and enabled us to unambiguously measure the propagation time for Confluent Replicator for topics it needed to replicate.

We also wrote some code that explicitly created topics rather than relying on the auto-create to determine how much effort was needed to remove the dependency on auto-create being enabled and used. That code amounted to less than 10 lines of code in the proof-of-concept. Production quality code may involve more code in order to: audit the creation, as well as log, and report problems and any run-time failures.

Further reading

“Auto topic creation on the broker has caused pain in the past; And today it still causes unusual error handling requirements on the client side, added complexity in the broker, mixed responsibility of the TopicMetadataRequest, and limits configuration of the option to be cluster wide. In the future having it broker side will also make features such as authorization very difficult.” KAFKA-2410 Implement “Auto Topic Creation” client side and remove support from Broker side

 

Six months review: learning how to test new technologies

I’ve not published for over a year, although I have several draft blog posts written and waiting for completion. One of the main reasons I didn’t publish was the direction my commercial work has been going recently, into domains and fields I’d not worked in for perhaps a decade or more. One of those assignments was for six months and I’d like to review various aspects of how I approached the testing, and some of my pertinent experiences and discoveries.

The core technology uses Apache Kafka which needed to be tested as a candidate for replicated data sharing between organisations. There were various criteria which took the deployment off the beaten track of many other uses of Apache Kafka’s popular deployment models, that is, the use was atypical and therefore it was important to understand how Kafka behaves for the intended use.

Kafka was new to me, and my experiences of some of the other technologies were sketchy, dated or both. I undertook the work at the request of the sponsor who knew of my work and research.

There was a massive amount to learn; and as we discovered lots to do in order to establish a useful testing process, including establishing environments to test the system. I aim to cover these in a series of blog articles here.

  • How I learned stuff
  • Challenges in monitoring Kafka and underlying systems
  • Tools for testing Kafka, particularly load testing
  • Establishing and refining a testing process
  • Getting to grips with AWS and several related services
  • Reporting and Analysis (yes in that order)
  • The many unfinished threads I’ve started