Apache Kafka: Docker Quick Start

Apache Kafka is a distributed streaming platform that can act as a message broker, as the heart of a stream processing pipeline, or even as the backbone of a large enterprise data synchronization system. Kafka is not only a highly-available and fault-tolerant system; it also handles vastly higher throughput compared to other message brokers such as RabbitMQ or ActiveMQ.

In this tutorial, you will utilize Docker & Docker Compose to run Apache Kafka & ZooKeeper. Docker with Docker Compose is the quickest way to get started with Apache Kafka and to experiment with clustering and the fault-tolerant properties Kafka provides. A full Docker Compose setup with 3 Kafka brokers and 1 ZooKeeper node can be found here.

Prerequisites

To complete this tutorial, you will need:

  • A UNIX environment (Mac or Linux)
  • Docker & Docker Compose

Note: Docker can be installed by following the official installation guide.

System Architecture

Before running Kafka with Docker, let's examine the architecture of a simple Apache Kafka setup.

  • Kafka Cluster: A group of Kafka brokers forming a distributed system
  • Kafka Broker: An instance of Kafka that holds topics of data
  • ZooKeeper: A centralized system for storing and managing configuration
  • Producer: A client that sends messages to a Kafka topic
  • Consumer: A client that read messages from a Kafka topic

Kafka utilizes ZooKeeper to manage and coordinate brokers within a cluster. Producers and consumers are the main clients that interact with Kafka, which we'll take a look at once we have a running Kafka broker.

Architecture diagram of integrations used in this tutorial

The above diagram shows the architecture of the systems we are going to run in this tutorial. It also helps demonstrate how Kafka brokers utilize ZooKeeper and shows the ports of the running services. In this tutorial, we'll start by running one Apache Kafka broker and one ZooKeeper node (seen above in blue). Later on, we'll form a three node cluster by adding in two more Kafka brokers (seen above in green).

Running ZooKeeper in Docker

Ensure you have Docker installed and running. You can verify this by running the following command; you should see a similar output.

docker -v
> Docker version 18.09.2, build 6247962

Additionally, verify you have Docker Compose installed:

docker-compose -v
> docker-compose version 1.23.2, build 1110ad01

We're ready to begin! Create a directory, such as ~/kafka, to store our Docker Compose files. Using your favorite text editor or IDE, create a file named docker-compose.yml in your new directory.

We'll start by getting ZooKeeper running. In the Docker Compose YAML file, define a zookeeper service as shown below:

version: '3'

services:
  zookeeper:
    image: zookeeper:3.4.9
    hostname: zookeeper
    ports:
      - "2181:2181"
    environment:
        ZOO_MY_ID: 1
        ZOO_PORT: 2181
        ZOO_SERVERS: server.1=zookeeper:2888:3888
    volumes:
      - ./data/zookeeper/data:/data
      - ./data/zookeeper/datalog:/datalog

A brief overview of what we're defining:

  • Line 1: docker compose file version number, set to 3
  • Line 4: starting the definition of a ZooKeeper service
  • Line 5: The docker image to use for ZooKeeper and its version
  • Line 6: The hostname the container will use when running
  • Lines 7-8: The ports to expose to the host; ZooKeeper's default port
  • Line 10: The unique ID of this ZooKeeper instance, set to 1
  • Line 11: The port this ZooKeeper instance should run with
  • Line 12: The list of ZooKeeper servers; in our case just one
  • Lines 13-15: Mapping volumes on the host to store ZooKeeper data

Note: We've mapped ./data/zookeeper on the host to directories within the container. This allows ZooKeeper to persist data even if you destroy the container.

We can now start ZooKeeper by running the following command in the directory containing the docker-compose.yml file:

docker-compose up

Logs will start printing, and should end with a line similar to this:

zookeeper_1  | ... binding to port 0.0.0.0/0.0.0.0:2181

Congrats! ZooKeeper is running and exposed on port 2181. You can verify this utilizing netcat in a new terminal window:

echo ruok | nc localhost 2181
> imok

Running Kafka In Docker

We can now add our first kafka service to our Docker Compose file. We're calling this kafka2 as it will have a broker id of 2 and run on the default port of 9092. Later on, we'll add in kafka1 and kafka3. This is to demonstrate that order does not matter and broker ids are just for identification.

version: '3'

services:
...
  kafka2:
    image: confluentinc/cp-kafka:5.3.0
    hostname: kafka2
    ports:
      - "9092:9092"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka2:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
      KAFKA_BROKER_ID: 2
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    volumes:
      - ./data/kafka2/data:/var/lib/kafka/data
    depends_on:
      - zookeeper

If you prefer, copy the full gist found here. A brief overview of what we're defining:

  • Line 6: The docker image to use for Kafka; we're using the Confluent image
  • Line 7: The hostname this Kafka broker will use when running
  • Line 8-9: The ports to expose; set to Kafka's default (9092)
  • Line 11: Kafka's advertised listeners. Robin Moffatt has a great blog post about this.
  • Line 12: Security protocols to use for each listener.
  • Line 13: The inter-broker listener name (used for internal communication)
  • Line 14: The list of ZooKeeper nodes Kafka should use
  • Line 15: The broker ID of this Kafka broker.
  • Line 16: The replication factor of the consumer offset topic (1 for one broker)
  • Lines 17-18: Mapping volumes on the host to store Kafka data
  • Lines 19-20: Start the ZooKeeper service before the Kafka service

Let's start the Kafka broker! In a new terminal window, run the following command in the same directory:

docker-compose up

ZooKeeper should still be running in another terminal, and if it isn't, Docker Compose will start it. You'll see a lot of logs being printed and then Kafka should be running! We can verify this by creating a topic.

If you have the Kafka command line tools installed, run:

kafka-topics --zookeeper localhost:2181 --create --topic new-topic --partitions 1 --replication-factor 1
> Created topic "new-topic".

If you don't have the Kafka command line tools installed, you can run a command using Docker as well:

docker exec -it kafka_kafka2_1 kafka-topics --zookeeper zookeeper:2181 --create --topic new-topic --partitions 1 --replication-factor 1
> Created topic "new-topic".

If you get any errors, verify both Kafka and ZooKeeper are running with docker ps and check the logs from the terminals running Docker Compose.

Yay! You now have the simplest Kafka cluster running within Docker. Kafka with broker id 2 is exposed on port 9092 and ZooKeeper on port 2181. Data for this Kafka cluster is stored in ./data/kafka2.

To stop the containers, you can use ctrl + c or cmd + c on the running Docker Compose terminal windows. If they don't stop, you can run docker-compose down. To remove the containers if they don't get removed as a part of down, you can run docker-compose rm.

Running Three Kafka Brokers In Docker

To run three brokers, we need to add two more kafka services to our Docker Compose file. We'll run broker 1 on port 9091 and broker 3 on port 9093.

Add two more services as so:

version: "3"

services:
...
  kafka1:
    image: confluentinc/cp-kafka:5.3.0
    hostname: kafka1
    ports:
      - "9091:9091"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka1:19091,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9091
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
      KAFKA_BROKER_ID: 1
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    volumes:
      - ./data/kafka1/data:/var/lib/kafka/data
    depends_on:
      - zookeeper

  kafka3:
    image: confluentinc/cp-kafka:5.3.0
    hostname: kafka3
    ports:
      - "9093:9093"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka3:19093,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9093
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
      KAFKA_BROKER_ID: 3
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    volumes:
      - ./data/kafka3/data:/var/lib/kafka/data
    depends_on:
      - zookeeper

You can find a full gist with ZooKeeper and three Kafka brokers here. Essentially, we update the ports, the broker ID, and the data directory on the host.

Note: In a production setup, you'd want the offset topic replication factor to be set higher than 1, but for the purposes of this tutorial I've left it at one since we started with one broker.

We can now verify that all three brokers are running by creating a topic with a replication factor of 3:

docker exec -it kafka_kafka2_1 kafka-topics --zookeeper zookeeper:2181 --create --topic three-isr --partitions 1 --replication-factor 3
> Created topic "three-isr".

If you receive an error, ensure all three Kafka clusters are running. Woohoo! You've now got a Kafka cluster with three brokers running.

Conclusion

Congrats! You've successfully started a local Kafka cluster using Docker and Docker Compose. Data is persisted outside of the container on the local machine which means you can delete containers and restart them without losing data. For next steps, I'd suggest playing around with Kafka's fault tolerance and replication features.

For example, you could create a topic with a replication factor of 3, produce some data, delete broker 2, delete broker 2's data directory (./data/kafka2), and start broker 2 and see that the data is replicated to the new broker. Pretty cool!

For full sets of Docker Compose files for running various Kafka Cluster setups, check out Stephane Maarek's kafka-stack-docker-compose repository. This post was inspired by it. :-).

Show Comments

Get the latest posts delivered right to your inbox.