Running Kafka in KRaft Mode with Docker Compose

Apache Kafka has become the undisputed champion of event streaming, message brokering, and log processing. With its ability to process live data streams in real-time, Kafka is ideal for applications requiring instant analytics. Its reliability and scalability enable it to handle massive amounts of data messages every second, making it suitable for both small and large-scale systems with high throughput and low latency requirements. But until recently, it came with a not-so-tiny caveat: Zookeeper. While Zookeeper did an excellent job managing metadata and coordination, it was also an extra dependency that engineers had to deploy, maintain, and troubleshoot. Enter KRaft (Kafka Raft), a new mode that eliminates the need for the Zookeeper coordination server, making Kafka leaner, more scalable, and easier to manage.
Lets’ walk through setting up Kafka in KRaft mode using Docker Compose. We’ll cover:
- Kafka and KRaft basics
- Key Kafka concepts
- Setting up Kafka in KRaft mode within Docker Compose
- Code examples to get you started
Let’s get started!
Why Apache Kafka?
Apache Kafka is a powerful distributed streaming platform that has become a popular choice for many organizations. Here are some compelling reasons why people use Kafka:
-
Real-Time Data Processing: With Kafka, data flows through the system almost instantly, making it ideal for live data streams and real-time analytics.
-
Reliability and Scalability: Kafka can handle millions of data messages every second, making it an excellent choice for both small and large-scale systems that require high throughput and low latency.
-
Fault Tolerance: If something goes wrong, Kafka is designed to recover quickly and keep the system running smoothly, ensuring that your data remains safe and intact.
-
Decoupling of Systems: Kafka allows different systems to send and receive data without requiring them to know too much about each other’s inner workings, promoting flexibility and maintainability in complex architectures.
Understanding Key Kafka Concepts
Before diving deeper into Apache Kafka, lets better understand some fundamental concepts:
- Producer: The system or service that sends data to Kafka, similar to the sender in a mailing system.
- Consumer: The system that reads or takes data from Kafka, serving as the recipient.
- Topic: A channel or folder where Kafka stores messages. Producers send data to topics, and consumers read data from them. Think of topics as categories for your data, like “user-logins”, “orders”, or “payments”
- Broker: A Kafka server that holds and manages messages within topics.
- Partition: Topics can be split into smaller pieces called partitions, which helps Kafka handle large amounts of data more efficiently.
- Kafka Connect - (Not Covered in this Article): Kafka Connect is a framework for integrating Kafka with other systems, enabling data ingestion and extraction.
- Serialization - (Not Covered in this Article): Kafka supports various serialization formats for messages, including String, JSON, and Avro, allowing flexibility in how data is represented.
- KRaft Controllers: In KRaft mode, special brokers called controllers manage metadata and coordination. Unlike Zookeeper, KRaft controllers are built into Kafka itself.
These core concepts form the foundation of Apache Kafka’s architecture, enabling the platform to process high volumes of data in a scalable, fault-tolerant manner.
Why Kafka in KRaft Mode?
Kafka in KRaft mode (Kafka Raft) replaces Zookeeper with an internal Raft consensus mechanism, which is:
- Simpler: No separate Zookeeper deployment required.
- More scalable: Kafka can better handle metadata without external coordination.
- More efficient: Reduces operational complexity and failure points.
- Faster: Improved metadata handling and startup times.
This makes KRaft a game-changer for Kafka deployments.
Modes: A Comparison between ZooKeeper and KRaft
Kafka has undergone significant changes over the years to improve its scalability, reliability, and fault tolerance. Two prominent modes of operation for Kafka are ZooKeeper Mode and KRaft Mode. Lets delve into the details of each mode and compare their key features.
ZooKeeper Mode (Legacy Mode)
In the past, Kafka relied on Apache ZooKeeper for leader election, configuration management, and other administrative tasks. This setup is often called “Legacy Mode” or “ZooKeeper Mode”.
- Leader Election: Each broker in the cluster maintains its own ZooKeeper connection to elect leaders and manage configuration.
- Broker Configuration: ZooKeeper stores broker configurations, including metadata, topic assignments, and other administrative data.
- Fault Tolerance: In case of failures or network partitions, ZooKeeper Mode can lead to prolonged downtime or loss of data.
KRaft Mode (Newer Architecture)
Kafka 3.0 introduced KRaft Mode, a newer architecture that replaces the traditional ZooKeeper-based setup. This mode aims to improve scalability, performance, and fault tolerance.
- Leader Election: Each broker uses an internal leader election mechanism, eliminating the need for ZooKeeper.
- Broker Configuration: Brokers store their configurations locally, reducing reliance on external services like ZooKeeper.
- Fault Tolerance: KRaft Mode introduces a more robust and fault-tolerant design, allowing clusters to recover quickly from failures.
Comparison between ZooKeeper Mode and KRaft Mode
Lets compare of the two modes:
Feature | ZooKeeper Mode (Legacy) | KRaft Mode (Newer Architecture) |
---|---|---|
Leader Election | External leader election via ZooKeeper | Internal leader election among brokers |
Broker Configuration | Stored in ZooKeeper | Stored locally on each broker |
Fault Tolerance | Can lead to prolonged downtime or loss of data | More robust and fault-tolerant design |
ZooKeeper Mode was the traditional setup for Kafka, KRaft Mode offers improved scalability, performance, and fault tolerance. If you’re planning to deploy a new cluster or upgrade an existing one, I recommend using KRaft Mode for its advantages over the legacy ZooKeeper based architecture.
Ultimately, you choose the mode that best fits your needs and org.
Finally, lets get into the setup!
Kafka in KRaft Mode: Setting Up with Docker Compose
Here are a few prerequisites before we begin:
- Install Docker: Ensure Docker and Docker Compose are installed and running on your system. You can download and install Docker from here.
- Basic understanding of Docker and Kafka
- Prepare the Workspace: Create a directory for your project (e.g.,
kafka-kraft
) and place thedocker-compose.yml
file there. - Basic understanding of Docker and Kafka
1. Define project directory kafka-kraft
Create and target project directory
mkdir kafka-kraft && cd kafka-kraft
2. Create helper scripts the create_cluster_id.sh
and update_run.sh
Files
Setup shell helper files to create cluster ID and configurations for removing dependency on ZooKeeper.
Create create_cluster_id.sh
file to create cluster ID on first launch.
touch create_cluster_id.sh
Add the following shell script to create_cluster_id.sh
file.
/bin/bash
file_path="/tmp/clusterID/clusterID"
if [ ! -f "$file_path" ]; then /bin/kafka-storage random-uuid > /tmp/clusterID/clusterID
echo "Cluster id has been created..."
fi
Create create_cluster_id.sh
file to apply zookeeper workaround and listener config on first launch.
touch update_run.sh
Add the following shell script to update_run.sh
file
#!/bin/sh
# Docker workaround: Remove check for KAFKA_ZOOKEEPER_CONNECT parameter
sed -i '/KAFKA_ZOOKEEPER_CONNECT/d' /etc/confluent/docker/configure
# Docker workaround: Remove check for KAFKA_ADVERTISED_LISTENERS parameter
sed -i '/dub ensure KAFKA_ADVERTISED_LISTENERS/d' /etc/confluent/docker/configure
# Docker workaround: Ignore cub zk-ready
sed -i 's/cub zk-ready/echo ignore zk-ready/' /etc/confluent/docker/ensure
file_path="/tmp/clusterID/clusterID"
interval=5 # wait interval in seconds
while [ ! -e "$file_path" ] || [ ! -s "$file_path" ]; do
echo "Waiting for $file_path to be created..."
sleep $interval
done
cat "$file_path"
# KRaft required step: Format the storage directory with a new cluster ID
echo "kafka-storage format --ignore-formatted -t $(cat "$file_path") -c /etc/kafka/kafka.properties" >> /etc/confluent/docker/ensure
3. Define the docker-compose.yml
File
We’ll set up a simple KRaft-enabled Kafka cluster with three brokers (one also acting as a controller).
Create docker-compose.yml
file
touch docker-compose.yml
Add the following YAML to docker-compose.yml
file
services:
kafka-gen:
image: confluentinc/cp-kafka:7.3.3
hostname: kafka-gen
container_name: kafka-gen
volumes:
- ./create_cluster_id.sh:/tmp/create_cluster_id.sh
- ./clusterID:/tmp/clusterID
command: "bash -c '/tmp/create_cluster_id.sh'"
# Broker #1
kafka1:
image: confluentinc/cp-kafka:7.3.3
hostname: kafka1
container_name: kafka1
ports:
- "39092:39092"
environment:
KAFKA_LISTENERS: BROKER://kafka1:19092,EXTERNAL://kafka1:39092,CONTROLLER://kafka1:9093
KAFKA_ADVERTISED_LISTENERS: BROKER://kafka1:19092,EXTERNAL://kafka1:39092
KAFKA_INTER_BROKER_LISTENER_NAME: BROKER
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,BROKER:PLAINTEXT,EXTERNAL:PLAINTEXT
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_PROCESS_ROLES: 'controller,broker'
KAFKA_NODE_ID: 1
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka1:9093,2@kafka2:9093,3@kafka3:9093'
KAFKA_METADATA_LOG_SEGMENT_MS: 15000
KAFKA_METADATA_MAX_RETENTION_MS: 1200000
KAFKA_METADATA_LOG_MAX_RECORD_BYTES_BETWEEN_SNAPSHOTS: 2800
KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
volumes:
- kafka1-data:/var/lib/kafka/data
- ./update_run.sh:/tmp/update_run.sh
- ./clusterID:/tmp/clusterID
command: "bash -c '/tmp/update_run.sh && /etc/confluent/docker/run'"
# Broker #2
kafka2:
image: confluentinc/cp-kafka:7.3.3
hostname: kafka2
container_name: kafka2
ports:
- "39093:39093"
environment:
KAFKA_LISTENERS: BROKER://kafka2:19093,EXTERNAL://kafka2:39093,CONTROLLER://kafka2:9093
KAFKA_ADVERTISED_LISTENERS: BROKER://kafka2:19093,EXTERNAL://kafka2:39093
KAFKA_INTER_BROKER_LISTENER_NAME: BROKER
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,BROKER:PLAINTEXT,EXTERNAL:PLAINTEXT
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_PROCESS_ROLES: 'controller,broker'
KAFKA_NODE_ID: 2
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka1:9093,2@kafka2:9093,3@kafka3:9093'
KAFKA_METADATA_LOG_SEGMENT_MS: 15000
KAFKA_METADATA_MAX_RETENTION_MS: 1200000
KAFKA_METADATA_LOG_MAX_RECORD_BYTES_BETWEEN_SNAPSHOTS: 2800
KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
volumes:
- kafka2-data:/var/lib/kafka/data
- ./_config/kafka/scripts/update_run.sh:/tmp/update_run.sh
- ./clusterID:/tmp/clusterID
command: "bash -c '/tmp/update_run.sh && /etc/confluent/docker/run'"
# Broker #3
kafka3:
image: confluentinc/cp-kafka:7.3.3
hostname: kafka3
container_name: kafka3
ports:
- "39094:39094"
environment:
KAFKA_LISTENERS: BROKER://kafka3:19094,EXTERNAL://kafka3:39094,CONTROLLER://kafka3:9093
KAFKA_ADVERTISED_LISTENERS: BROKER://kafka3:19094,EXTERNAL://kafka3:39094
KAFKA_INTER_BROKER_LISTENER_NAME: BROKER
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,BROKER:PLAINTEXT,EXTERNAL:PLAINTEXT
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_PROCESS_ROLES: 'controller,broker'
KAFKA_NODE_ID: 3
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka1:9093,2@kafka2:9093,3@kafka3:9093'
KAFKA_METADATA_LOG_SEGMENT_MS: 15000
KAFKA_METADATA_MAX_RETENTION_MS: 1200000
KAFKA_METADATA_LOG_MAX_RECORD_BYTES_BETWEEN_SNAPSHOTS: 2800
KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
volumes:
- kafka3-data:/var/lib/kafka/data
- ./_config/kafka/scripts/update_run.sh:/tmp/update_run.sh
- ./clusterID:/tmp/clusterID
command: "bash -c '/tmp/update_run.sh && /etc/confluent/docker/run'"
# Kafka web UI application. Found at http://localhost:8080/ui/
kafka-ui:
container_name: kafka-ui
image: provectuslabs/kafka-ui:latest
ports:
- 8080:8080
depends_on:
- kafka3
- kafka1
- kafka2
environment:
KAFKA_CLUSTERS_0_NAME: local
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka1:19092,kafka2:19093,kafka3:19094
KAFKA_CLUSTERS_0_METRICS_PORT: 9997
volumes:
kafka1-data:
kafka2-data:
kafka3-data:
4. Start Kafka
Run the following command to start yourKafka setup:
docker compose -f ./docker-compose.yaml up
If all goes well, your KRaft-powered Kafka instance should be up and running!
5. Create a Topic
Run the following command to create a new topic:
docker exec -it kafka-1 kafka-topics --create --topic user.login --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Producing and Consuming Messages
With Kafka up and running, let’s produce and consume messages.
1. Create a Topic
Run the following command to create a new topic:
docker exec -ti kafka1 /usr/bin/kafka-topics --create --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 --replication-factor 2 --partitions 4 --topic user.signup
This will create a topppic where you can publish messages.
2. Produce Messages
Start a producer:
docker exec -it kafka-1 kafka-console-producer --topic user.login --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 localhost:9092
This should enable a dialog where you are able to input a message to be sent to broker. Press enter afterwords.
3. Consume Messages
Start a consumer:
docker exec -it kafka-1 kafka-console-consumer --topic user.login --from-beginning --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 localhost:9092
Your messages should appear in real time!
Monitoring Cluster
1. Check Running Topics
docker exec -it kafka-1 kafka-topics --list --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 localhost:9092
2. Describe Topic Details
docker exec -it kafka-1 kafka-topics --describe --topic user.login --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 localhost:9092
3. Check Controller Status
docker exec -it kafka-1 kafka-metadata-shell --bootstrap-server kafka1:19092,kafka2:19093,kafka3:19094 localhost:9092 --list-all-brokers
Troubleshooting Common Issues
1. Kafka Fails to Start
- Ensure you’ve formatted the storage (
kafka-storage format
). - Check container logs:
docker logs kafka-1
.
2. Producer/Consumer Issues
- Check if the topic exists:
kafka-topics --list
. - Verify brokers are reachable (
nc -zv localhost 9092
). - Ensure advertised listeners are correctly set.
Running Kafka in KRaft mode with Docker Compose simplifies deployment by eliminating Zookeeper. By utilizing KRaft, we get a more efficient, scalable, and streamlined Kafka setup. Whether you’re a developer testing Kafka locally or an engineer managing production workloads, KRaft mode is a major step forward.
Additional Reads