1592474868391 1
|

Streamlining Airflow Deployment: Leveraging Docker Compose

Introduction

Data pipelines constitute a crucial element in any data project, enabling us to monitor the flow of data from its source to the desired destination. In this context, Apache Airflow emerges as a powerful platform for managing and orchestrating data pipelines efficiently. This open-source tool facilitates the creation, management, and orchestration of data pipelines.

A particularly user-friendly approach I’ve discovered for deploying Airflow is through Docker. Docker is an application that enables the creation of containerized and isolated development environments.

To streamline the process, we will leverage Docker Compose, allowing us to simultaneously run multiple containerized applications. This becomes essential as Airflow encompasses various components, such as the Webserver, Database, Scheduler, Redis, Workers, among others.

Step 1. Download Docker Desktop

Download Link: https://www.docker.com/products/docker-desktop/

Confirm Successful Installation.

Run the following command on the command prompt.

86d6af0d 525b 478d b10b 9058c3ef4d67

Step 2. Download and Install Docker Compose

As mentioned, it will allow us to run multiple containerized application simultaneously as observed below:

265638ff 8c47 48a9 b9b0 3f2c6ee7720a

If you’re using windows ensure you have curl installed by running the code below in your command prompt. curl is a command-line tool and library for transferring data with URLs.

curl --version

If installed, run the following command in your terminal:

99b8034f 92b9 43f3 8125 197b5e2cccbb

Alternatively, if curl is not installed:

61554e4f b06c 4dd8 a8fd bc0db24ef119

Confirm successful installation:

docker-compose --version

Step 3. Creating directories

Mount local directories (dags, logs, plugins, and config) to their corresponding counterparts within the Docker container. This ensures seamless synchronization between your local development environment and the container.

You should see the volumes section within the docker-compose.yaml file as below:

ee091973 5983 4546 8e25 4ad726f7fee6

Create these directories within your code editor. Use the code below on your terminal.

mkdir logs dags plugins config

Step 4. Initialize Docker Compose

Run the following code in your VSCode terminal to initialize.

docker-compose up airflow-init

Upon successful initialization, you’ll see the message ‘User ‘airflow’ created with role ‘Admin’;

b82dc7e5 3c68 4263 bb8d 975ba9a62b4e

Step 5. Start the engines

Within your code editor terminal, run the following command;

docker compose up

This signals to Docker Compose to start all the containers defined in your configuration. You should see the following activity within your terminal.

c587fd05 40dc 437d 8749 621c17bf95e3

Remember: For security reasons, change the default “airflow” username and password after initial setup. Refer to the Airflow web UI documentation for instructions.

Now you’re ready to create your data pipelines! Place your DAG files in the ./dags directory within your project.

To access the Airflow web UI and manage your pipelines, open your web browser and navigate tohttp://localhost:8080.

Step 6. Killing the containers

Run the following command in the terminal;

docker compose down

Verify that the containers have stopped successfully;

b1913949 5a00 49d3 b79d 687a4a80375b

This should allow a quick and easy way to utilize Airflow for your data pipeline.

Resuming your Airflow Project

  1. Launch Docker Desktop.
  2. Open your IDE and navigate to your project directory.
  3. In your terminal, run :docker-compose up -d. This starts the containers in detached mode, allowing you to close the terminal.
  4. Access your Airflow web UI usinghttp://localhost:8080.
  5. Continue working on your project.

Additional Links

Conclusion


In conclusion, streamlining Apache Airflow deployment through Docker Compose offers a transformative solution for managing data pipelines with efficiency and ease. By following the step-by-step guide outlined in this blog, users can unlock the full potential of Airflow’s capabilities while harnessing the simplicity and versatility of Docker Compose.

This approach not only simplifies deployment but also enhances scalability and maintainability, empowering teams to focus on the core aspects of their data projects without being bogged down by deployment complexities. As data continues to play a pivotal role in driving insights and innovation, mastering tools like Airflow and Docker Compose becomes indispensable for staying ahead in today’s rapidly evolving landscape.

Similar Posts