Running Kafka in KRaft Mode with Docker Compose

Running Kafka in KRaft Mode with Docker Compose

December 26, 2024Explore setting up Apache Kafka in KRaft mode using Docker Compose

Apache Kafka has become the undisputed champion of event streaming, message brokering, and log processing. With its ability to process live data streams in real-time, Kafka is ideal for applications requiring instant analytics. Its reliability and scalability enable it to handle massive amounts of data messages every second, making it suitable for both small and large-scale systems with high throughput and low latency requirements. But until recently, it came with a not-so-tiny caveat: Zookeeper. While Zookeeper did an excellent job managing metadata and coordination, it was also an extra dependency that engineers had to deploy, maintain, and troubleshoot. Enter KRaft (Kafka Raft), a new mode that eliminates the need for the Zookeeper coordination server, making Kafka leaner, more scalable, and easier to manage.

Lets’ walk through setting up Kafka in KRaft mode using Docker Compose. We’ll cover:

  • Kafka and KRaft basics
  • Key Kafka concepts
  • Setting up Kafka in KRaft mode within Docker Compose
  • Code examples to get you started

Let’s get started!

Why Apache Kafka?

Apache Kafka is a powerful distributed streaming platform that has become a popular choice for many organizations. Here are some compelling reasons why people use Kafka:

  • Real-Time Data Processing: With Kafka, data flows through the system almost instantly, making it ideal for live data streams and real-time analytics.

  • Reliability and Scalability: Kafka can handle millions of data messages every second, making it an excellent choice for both small and large-scale systems that require high throughput and low latency.

  • Fault Tolerance: If something goes wrong, Kafka is designed to recover quickly and keep the system running smoothly, ensuring that your data remains safe and intact.

  • Decoupling of Systems: Kafka allows different systems to send and receive data without requiring them to know too much about each other’s inner workings, promoting flexibility and maintainability in complex architectures.

Understanding Key Kafka Concepts

Before diving deeper into Apache Kafka, lets better understand some fundamental concepts:

  • Producer: The system or service that sends data to Kafka, similar to the sender in a mailing system.
  • Consumer: The system that reads or takes data from Kafka, serving as the recipient.
  • Topic: A channel or folder where Kafka stores messages. Producers send data to topics, and consumers read data from them. Think of topics as categories for your data, like “user-logins”, “orders”, or “payments”
  • Broker: A Kafka server that holds and manages messages within topics.
  • Partition: Topics can be split into smaller pieces called partitions, which helps Kafka handle large amounts of data more efficiently.
  • Kafka Connect - (Not Covered in this Article): Kafka Connect is a framework for integrating Kafka with other systems, enabling data ingestion and extraction.
  • Serialization - (Not Covered in this Article): Kafka supports various serialization formats for messages, including String, JSON, and Avro, allowing flexibility in how data is represented.
  • KRaft Controllers: In KRaft mode, special brokers called controllers manage metadata and coordination. Unlike Zookeeper, KRaft controllers are built into Kafka itself.

These core concepts form the foundation of Apache Kafka’s architecture, enabling the platform to process high volumes of data in a scalable, fault-tolerant manner.

Why Kafka in KRaft Mode?

Kafka in KRaft mode (Kafka Raft) replaces Zookeeper with an internal Raft consensus mechanism, which is:

  • Simpler: No separate Zookeeper deployment required.
  • More scalable: Kafka can better handle metadata without external coordination.
  • More efficient: Reduces operational complexity and failure points.
  • Faster: Improved metadata handling and startup times.

This makes KRaft a game-changer for Kafka deployments.

Modes: A Comparison between ZooKeeper and KRaft

Kafka has undergone significant changes over the years to improve its scalability, reliability, and fault tolerance. Two prominent modes of operation for Kafka are ZooKeeper Mode and KRaft Mode. Lets delve into the details of each mode and compare their key features.

ZooKeeper Mode (Legacy Mode)

In the past, Kafka relied on Apache ZooKeeper for leader election, configuration management, and other administrative tasks. This setup is often called “Legacy Mode” or “ZooKeeper Mode”.

  • Leader Election: Each broker in the cluster maintains its own ZooKeeper connection to elect leaders and manage configuration.
  • Broker Configuration: ZooKeeper stores broker configurations, including metadata, topic assignments, and other administrative data.
  • Fault Tolerance: In case of failures or network partitions, ZooKeeper Mode can lead to prolonged downtime or loss of data.

KRaft Mode (Newer Architecture)

Kafka 3.0 introduced KRaft Mode, a newer architecture that replaces the traditional ZooKeeper-based setup. This mode aims to improve scalability, performance, and fault tolerance.

  • Leader Election: Each broker uses an internal leader election mechanism, eliminating the need for ZooKeeper.
  • Broker Configuration: Brokers store their configurations locally, reducing reliance on external services like ZooKeeper.
  • Fault Tolerance: KRaft Mode introduces a more robust and fault-tolerant design, allowing clusters to recover quickly from failures.

Comparison between ZooKeeper Mode and KRaft Mode

Lets compare of the two modes:

Feature ZooKeeper Mode (Legacy) KRaft Mode (Newer Architecture)
Leader Election External leader election via ZooKeeper Internal leader election among brokers
Broker Configuration Stored in ZooKeeper Stored locally on each broker
Fault Tolerance Can lead to prolonged downtime or loss of data More robust and fault-tolerant design

ZooKeeper Mode was the traditional setup for Kafka, KRaft Mode offers improved scalability, performance, and fault tolerance. If you’re planning to deploy a new cluster or upgrade an existing one, I recommend using KRaft Mode for its advantages over the legacy ZooKeeper based architecture.

Ultimately, you choose the mode that best fits your needs and org.

Finally, lets get into the setup!

Kafka in KRaft Mode: Setting Up with Docker Compose

Here are a few prerequisites before we begin:

  • Install Docker: Ensure Docker and Docker Compose are installed and running on your system. You can download and install Docker from here.
  • Basic understanding of Docker and Kafka
  • Prepare the Workspace: Create a directory for your project (e.g., kafka-kraft) and place the docker-compose.yml file there.
  • Basic understanding of Docker and Kafka

1. Define project directory kafka-kraft

Create and target project directory

mkdir kafka-kraft && cd kafka-kraft

2. Create helper scripts the create_cluster_id.sh and update_run.sh Files

Setup shell helper files to create cluster ID and configurations for removing dependency on ZooKeeper.

Create create_cluster_id.sh file to create cluster ID on first launch.

touch create_cluster_id.sh

Add the following shell script to create_cluster_id.sh file.

create_cluster_id.sh
/bin/bash

file_path="/tmp/clusterID/clusterID"

if [ ! -f "$file_path" ]; then /bin/kafka-storage random-uuid > /tmp/clusterID/clusterID

echo "Cluster id has been created..."
fi

Create create_cluster_id.sh file to apply zookeeper workaround and listener config on first launch.

touch update_run.sh

Add the following shell script to update_run.sh file

update_run.sh
#!/bin/sh

# Docker workaround: Remove check for KAFKA_ZOOKEEPER_CONNECT parameter
sed -i '/KAFKA_ZOOKEEPER_CONNECT/d' /etc/confluent/docker/configure

# Docker workaround: Remove check for KAFKA_ADVERTISED_LISTENERS parameter
sed -i '/dub ensure KAFKA_ADVERTISED_LISTENERS/d' /etc/confluent/docker/configure

# Docker workaround: Ignore cub zk-ready
sed -i 's/cub zk-ready/echo ignore zk-ready/' /etc/confluent/docker/ensure

file_path="/tmp/clusterID/clusterID"
interval=5 # wait interval in seconds

while [ ! -e "$file_path" ] || [ ! -s "$file_path" ]; do
  echo "Waiting for $file_path to be created..."
  sleep $interval
done

cat "$file_path"
# KRaft required step: Format the storage directory with a new cluster ID


echo "kafka-storage format --ignore-formatted -t $(cat "$file_path") -c /etc/kafka/kafka.properties" >> /etc/confluent/docker/ensure

3. Define the docker-compose.yml File

We’ll set up a simple KRaft-enabled Kafka cluster with three brokers (one also acting as a controller).

Create docker-compose.yml file

touch docker-compose.yml

Add the following YAML to docker-compose.yml file

docker-compose.yml
services:

  kafka-gen:
    image: confluentinc/cp-kafka:7.3.3
    hostname: kafka-gen
    container_name: kafka-gen
    volumes:
      - ./create_cluster_id.sh:/tmp/create_cluster_id.sh
      - ./clusterID:/tmp/clusterID
    command: "bash -c '/tmp/create_cluster_id.sh'"

  # Broker #1

  kafka1:
    image: confluentinc/cp-kafka:7.3.3
    hostname: kafka1
    container_name: kafka1
    ports:
      - "39092:39092"
    environment:
      KAFKA_LISTENERS: BROKER://kafka1:19092,EXTERNAL://kafka1:39092,CONTROLLER://kafka1:9093
      KAFKA_ADVERTISED_LISTENERS: BROKER://kafka1:19092,EXTERNAL://kafka1:39092
      KAFKA_INTER_BROKER_LISTENER_NAME: BROKER
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,BROKER:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_PROCESS_ROLES: 'controller,broker'
      KAFKA_NODE_ID: 1
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka1:9093,2@kafka2:9093,3@kafka3:9093'
      KAFKA_METADATA_LOG_SEGMENT_MS: 15000
      KAFKA_METADATA_MAX_RETENTION_MS: 1200000
      KAFKA_METADATA_LOG_MAX_RECORD_BYTES_BETWEEN_SNAPSHOTS: 2800
      KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
    volumes:
      - kafka1-data:/var/lib/kafka/data
      - ./update_run.sh:/tmp/update_run.sh
      - ./clusterID:/tmp/clusterID
    command: "bash -c '/tmp/update_run.sh && /etc/confluent/docker/run'"

  # Broker #2

  kafka2:
    image: confluentinc/cp-kafka:7.3.3
    hostname: kafka2
    container_name: kafka2
    ports:
      - "39093:39093"
    environment:
      KAFKA_LISTENERS: BROKER://kafka2:19093,EXTERNAL://kafka2:39093,CONTROLLER://kafka2:9093
      KAFKA_ADVERTISED_LISTENERS: BROKER://kafka2:19093,EXTERNAL://kafka2:39093
      KAFKA_INTER_BROKER_LISTENER_NAME: BROKER
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,BROKER:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_PROCESS_ROLES: 'controller,broker'
      KAFKA_NODE_ID: 2
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka1:9093,2@kafka2:9093,3@kafka3:9093'
      KAFKA_METADATA_LOG_SEGMENT_MS: 15000
      KAFKA_METADATA_MAX_RETENTION_MS: 1200000
      KAFKA_METADATA_LOG_MAX_RECORD_BYTES_BETWEEN_SNAPSHOTS: 2800
      KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
    volumes:
      - kafka2-data:/var/lib/kafka/data
      - ./_config/kafka/scripts/update_run.sh:/tmp/update_run.sh
      - ./clusterID:/tmp/clusterID
    command: "bash -c '/tmp/update_run.sh && /etc/confluent/docker/run'"

  # Broker #3

  kafka3:
    image: confluentinc/cp-kafka:7.3.3
    hostname: kafka3
    container_name: kafka3
    ports:
      - "39094:39094"
    environment:
      KAFKA_LISTENERS: BROKER://kafka3:19094,EXTERNAL://kafka3:39094,CONTROLLER://kafka3:9093
      KAFKA_ADVERTISED_LISTENERS: BROKER://kafka3:19094,EXTERNAL://kafka3:39094
      KAFKA_INTER_BROKER_LISTENER_NAME: BROKER
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,BROKER:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_PROCESS_ROLES: 'controller,broker'
      KAFKA_NODE_ID: 3
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka1:9093,2@kafka2:9093,3@kafka3:9093'
      KAFKA_METADATA_LOG_SEGMENT_MS: 15000
      KAFKA_METADATA_MAX_RETENTION_MS: 1200000
      KAFKA_METADATA_LOG_MAX_RECORD_BYTES_BETWEEN_SNAPSHOTS: 2800
      KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
    volumes:
      - kafka3-data:/var/lib/kafka/data
      - ./_config/kafka/scripts/update_run.sh:/tmp/update_run.sh
      - ./clusterID:/tmp/clusterID
    command: "bash -c '/tmp/update_run.sh && /etc/confluent/docker/run'"

  # Kafka web UI application. Found at http://localhost:8080/ui/

  kafka-ui:
    container_name: kafka-ui
    image: provectuslabs/kafka-ui:latest
    ports:
      - 8080:8080
    depends_on:
      - kafka3
      - kafka1
      - kafka2
    environment:
      KAFKA_CLUSTERS_0_NAME: local
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka1:19092,kafka2:19093,kafka3:19094
      KAFKA_CLUSTERS_0_METRICS_PORT: 9997

volumes:
  kafka1-data:
  kafka2-data:
  kafka3-data:

4. Start Kafka

Run the following command to start yourKafka setup:

docker compose -f ./docker-compose.yaml up

If all goes well, your KRaft-powered Kafka instance should be up and running!

5. Create a Topic

Run the following command to create a new topic:

docker exec -it kafka-1 kafka-topics --create --topic user.login --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

 

Producing and Consuming Messages

With Kafka up and running, let’s produce and consume messages.

1. Create a Topic

Run the following command to create a new topic:

docker exec -ti kafka1 /usr/bin/kafka-topics --create --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 --replication-factor 2 --partitions 4 --topic user.signup

This will create a topppic where you can publish messages.

2. Produce Messages

Start a producer:

docker exec -it kafka-1 kafka-console-producer --topic user.login --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 localhost:9092

This should enable a dialog where you are able to input a message to be sent to broker. Press enter afterwords.

3. Consume Messages

Start a consumer:

docker exec -it kafka-1 kafka-console-consumer --topic user.login --from-beginning --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 localhost:9092

Your messages should appear in real time!

 

Monitoring Cluster

1. Check Running Topics

docker exec -it kafka-1 kafka-topics --list --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 localhost:9092

2. Describe Topic Details

docker exec -it kafka-1 kafka-topics --describe --topic user.login --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 localhost:9092

3. Check Controller Status

docker exec -it kafka-1 kafka-metadata-shell --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 localhost:9092 --list-all-brokers

 

Troubleshooting Common Issues

1. Kafka Fails to Start

  • Ensure you’ve formatted the storage (kafka-storage format).
  • Check container logs: docker logs kafka-1.

2. Producer/Consumer Issues

  • Check if the topic exists: kafka-topics --list.
  • Verify brokers are reachable (nc -zv localhost 9092).
  • Ensure advertised listeners are correctly set.

 

Running Kafka in KRaft mode with Docker Compose simplifies deployment by eliminating Zookeeper. By utilizing KRaft, we get a more efficient, scalable, and streamlined Kafka setup. Whether you’re a developer testing Kafka locally or an engineer managing production workloads, KRaft mode is a major step forward.

Additional Reads