You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/04/29 02:15:14 UTC

[GitHub] [airflow] resdevd opened a new pull request #8621: Docker Compose CeleryExecutor example

resdevd opened a new pull request #8621:
URL: https://github.com/apache/airflow/pull/8621


   This is a docker-compose template ideal for local development using official Airflow Docker images with easy deploy instructions. Addressing issue https://github.com/apache/airflow/issues/8605
   
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Target Github ISSUE in description if exists
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] resdevd edited a comment on pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

resdevd edited a comment on pull request #8621:
URL: https://github.com/apache/airflow/pull/8621#issuecomment-623221335


   entrypoint.sh is maintained in dockerfile itself. This PR is docker-compose setup using Airflow official Docker Image.
   If you explore the code, you can see one time setup of initdb and inituser via docker-exec into main airflow container via a shell script.
   The idea is users who need initdb and intiuser setup can run that script one time after deploying docker compose, or users can just use docker-compose by replacing env variables with an existing db connection string.
   In this PR, the containers are in restart mode, so users can run that script while docker-compose is active. IMO we shouldn't waste resources by having a container run just for one time action. In this PR, users won't have to rebuild images by changing entrypoint.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] resdevd commented on pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

resdevd commented on pull request #8621:
URL: https://github.com/apache/airflow/pull/8621#issuecomment-620956007


   @potiuk FYI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] boring-cyborg[bot] commented on pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

boring-cyborg[bot] commented on pull request #8621:
URL: https://github.com/apache/airflow/pull/8621#issuecomment-620954978

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:
- Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
- In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
- Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
- Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
- Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://apache-airflow-slack.herokuapp.com/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] stale[bot] closed pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

stale[bot] closed pull request #8621:
URL: https://github.com/apache/airflow/pull/8621


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] resdevd edited a comment on pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

resdevd edited a comment on pull request #8621:
URL: https://github.com/apache/airflow/pull/8621#issuecomment-623221335


   entrypoint.sh is maintained in dockerfile itself. This PR is docker-compose setup using Airflow official Docker Image.
   If you explore the code, you can see one time setup of initdb and inituser via docker-exec into main airflow container via a shell script.
   The idea is users who need initdb and intiuser setup can run that script one time after deploying docker compose, or users can just use docker-compose by replacing env variables for db connection string.
   In this PR, the containers are in restart mode, so users can run that script while docker-compose is active. IMO we shouldn't waste resources by having a container run just for one time action. In this PR, users won't have to rebuild images by changing entrypoint.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] gpongracz commented on pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

gpongracz commented on pull request #8621:
URL: https://github.com/apache/airflow/pull/8621#issuecomment-623185001


   Looks like the entrypoint.sh is not doing an initdb or upgradedb prior to the command like is found in puckel
   https://github.com/puckel/docker-airflow/blob/master/script/entrypoint.sh#L110
   I think that the puckel entrypoint script is very elegant and may help inform design of this one...
   In the mean time I have written a docker-compose for a local executor definition that overcomes this by setting restart to on-failure and starting a container which runs an initdb which then stops after successfully completing - a bit of a cludge but it gets the current image working...
   This is certainly not a production script but it works with the current image
   I've commented out a couple of optional mounts for dags and in my case aws keys
   If a feature similar to the puckel entrypoint above is developed then the webserver and scheduler definitions in the following docker-compose could be collapsed into a single container
   
   **_docker-compose.yml (working example mitigating for entrypoint.sh lacking initdb)_**
   version: '3.7'
   services:
       postgres:
           image: postgres:10
           restart: on-failure
           environment:
               - POSTGRES_USER=airflow
               - POSTGRES_PASSWORD=airflow
               - POSTGRES_DB=airflow
           logging:
               options:
                   max-size: 10m
                   max-file: "3"
           ports:
               - "5432:5432"
       initdb:
           image: apache/airflow:latest
           restart: on-failure
           depends_on:
               - postgres
           environment:
               - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
               - AIRFLOW__CORE__EXECUTOR=LocalExecutor
           logging:
               options:
                   max-size: 10m
                   max-file: "3"
           command: initdb
       webserver:
           image: apache/airflow:latest
           restart: on-failure
           depends_on:
               - postgres
           environment:
               - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
               - AIRFLOW__CORE__LOAD_EXAMPLES=True
               - AIRFLOW__CORE__EXECUTOR=LocalExecutor
               - AIRFLOW__WEBSERVER__BASE_URL=http://localhost:8080
           logging:
               options:
                   max-size: 10m
                   max-file: "3"
   #        volumes:
   #            - ./dags:/opt/airflow/dags
   #            - ~/.aws:/home/airflow/.aws
           ports:
               - "8080:8080"
           command: webserver
       scheduler:
           image: apache/airflow:latest
           restart: on-failure
           depends_on:
               - postgres
           environment:
               - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
               - AIRFLOW__CORE__LOAD_EXAMPLES=True
               - AIRFLOW__CORE__EXECUTOR=LocalExecutor
               - AIRFLOW__WEBSERVER__BASE_URL=http://localhost:8080
           logging:
               options:
                   max-size: 10m
                   max-file: "3"
   #        volumes:
   #            - ./dags:/opt/airflow/dags
   #            - ~/.aws:/home/airflow/.aws
           command: scheduler


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] BasPH commented on pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

BasPH commented on pull request #8621:
URL: https://github.com/apache/airflow/pull/8621#issuecomment-621632259


   I think we should decide if the docker-compose file will serve as a "hello world" example, or should it be something more complex, but closer to a production deployment?
   
   My ideas:
   - Users will likely want to use it as an example and change it for their specific deployment
   - I think these should be as simple as possible to run, i.e. a one-liner in the readme
   - I think the folder name "templates" is incorrect here, no templating is being done? Something like "examples/docker", or even "docker-compose" in the root of the project makes more sense IMO
   - There are already plenty of example DAGs, let's not add yet another example
   
   To address your points about the extension fields:
   
   > Most production environments like ECS/Fargate and Docker swarm don't support them unlike docker env variables.
   
   I think the goal should be a simple example docker-compose file. People will then extract bits and pieces from the example into their deployment. There's a ton of different ways to do deployment, so I would not aim for creating a docker-compose file with the goal of being close to one single deployment method. Instead, aim for something easy to comprehend.
   
   > Users might want to use different images and volumes for different airflow services. (Airflow workers might need java, specific python packages etc.,)
   
   Okay, why does this change the argument for using the extension fields?
   
   > Much of code duplication is eliminated in form of .env master file already.
   
   If we want the docker-compose file to serve as an example, I think it should be as self-explanatory as possible. Storing env vars in a separate file does not help that cause, I think using extension fields and having everything in the same file will be clearer.
   
   > Readability and Adoptation of extension-fields is still lagging behind with majority of docker/docker-compose users.
   
   How so?
   
   > Container names should be unique.
   
   You still have unique container names.
   
   > Advanced users can always configure the config according to their needs.
   
   Yes.
   
   See the following docker-compose file using extension fields, I think it's pretty readable:
   
   ```yaml
   version: '3'
   
   # ========================== AIRFLOW ENVIRONMENT VARIABLES ===========================
   x-environment: &airflow_environment
     - AIRFLOW__CORE__EXECUTOR=LocalExecutor
     - AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
     - AIRFLOW__CORE__LOAD_EXAMPLES=False
     - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://airflow:airflow@postgres:5432/airflow
     - AIRFLOW__CORE__STORE_DAG_CODE=True
     - AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
     - AIRFLOW__WEBSERVER__RBAC=True
   # ========================== /AIRFLOW ENVIRONMENT VARIABLES ==========================
   
   services:
     postgres:
       image: postgres:12-alpine
       environment:
         - POSTGRES_USER=airflow
         - POSTGRES_PASSWORD=airflow
         - POSTGRES_DB=airflow
       ports:
         - "5432:5432"
   
     webserver:
       image: apache/airflow:1.10.10-python3.7
       ports:
         - "8080:8080"
       environment: *airflow_environment
       command: webserver
   
     scheduler:
       image: apache/airflow:1.10.10-python3.7
       environment: *airflow_environment
       command: scheduler
   ```
   
   For the initialization I've used this thing in the past, worked quite okay. We could make it a bit more sophisticated, e.g. by checking some condition to be True once a second, instead of `sleep 5`:
   ```yaml
     initdb_adduser:
       image: apache/airflow:1.10.10-python3.7
       depends_on:
         - postgres
       environment: *airflow_environment
       entrypoint: /bin/bash
       # The webserver initializes permissions, so sleep for that to (approximately) be finished
       # No disaster if the webserver isn't finished by then, but create_user will start spitting out errors until the permissions exist
       command: -c 'airflow initdb && sleep 5 && airflow create_user --role Admin --username airflow --password airflow -e airflow@airflow.com -f airflow -l airflow'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] resdevd commented on pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

resdevd commented on pull request #8621:
URL: https://github.com/apache/airflow/pull/8621#issuecomment-623221335


   entrypoint.sh is maintained in dockerfile itself. This PR is docker-compose setup using Airflow official Docker Image.
   If you explore the code, you can see one time setup of initdb and inituser via docker-exec into main airflow container via a shell script.
   The idea is users who need initdb and intiuser setup can run that script one time after deploying docker compose, or users can just use docker-compose by replacing env variables for db connection string.
   In this PR, the containers are in restart mode, so users can run that script while docker-compose is active. IMO we shouldn't waste resources by having a container run just for one time action.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] resdevd commented on pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

resdevd commented on pull request #8621:
URL: https://github.com/apache/airflow/pull/8621#issuecomment-621684011


   > I think we should decide if the docker-compose file will serve as a "hello world" example, or should it be something more complex, but closer to a production deployment?
   
   IMO the simplest way to run airflow is still pip install with airflow commands. A "hello world" example should not be requiring users to learn docker and docker-compose. 
   
   > I think the folder name "templates" is incorrect here, no templating is being done? Something like "examples/docker", or even "docker-compose" in the root of the project makes more sense IMO
   
   Sure, like @turbaszek mentioned we could use a name like "deploy" etc., where more docker-compose config with LocalExecutor and potentially even k8s/helm config can be added in future. IMO "examples" would be confusing as it might refer to example dags and operators. 
   
   > There are already plenty of example DAGs, let's not add yet another example
   
   It wasn't about adding an example, the goal was to show usage of docker volumes for dags and plugins.
   
   > I think the goal should be a simple example docker-compose file. People will then extract bits and pieces from the example into their deployment. There's a ton of different ways to do deployment, so I would not aim for creating a docker-compose file with the goal of being close to one single deployment method. Instead, aim for something easy to comprehend.
   
   This PR is addressing https://github.com/apache/airflow/issues/8605 "Add Production-ready docker compose for the production image". 
   IMO CeleryExecutor is probably most complex config in terms of docker-compose, It's easy to change from CeleryExecutor to LocalExecutor than vice-versa. 
   
   > Okay, why does this change the argument for using the extension fields?
   
   I was referring to the code context that was being referred to regarding extension fields in the original review.
   
   > If we want the docker-compose file to serve as an example, I think it should be as self-explanatory as possible. Storing env vars in a separate file does not help that cause, I think using extension fields and having everything in the same file will be clearer.
   
   Like mentioned above, this PR is addressing an issue where the goal is to have docker-compose close to production.  Having fernet keys, sensitive info that live in the docker-compose is not a best practice either (in terms of .gitignore config and migration to ECS/Fargate, Docker Swarm etc., ). 
   
   
   > > Readability and Adoptation of extension-fields is still lagging behind with majority of docker/docker-compose users.
   
   > 
   > How so?
   
   Glad you asked. Prior to Airflow's official Docker images, the two most popular docker-compose airflow code include https://github.com/puckel/docker-airflow and https://github.com/bitnami/bitnami-docker-airflow . I don't think extension fields are used there. 
   Not only that, I don't think you'd find them being used in even in official Docker's repo https://github.com/docker/awesome-compose repo extensively. You might notice same behavior with some of the most popular docker-compose code like https://github.com/deviantony/docker-elk and https://github.com/vegasbrianc/prometheus etc.,
   
   For the context, Extension fields were introduced in docker-compose 3.4 in 2017. 
   
   > You still have unique container names.
   
   I was referring to the code context that was being referred to regarding entension fields in the original review.
   
   > For the initialization I've used this thing in the past, worked quite okay. We could make it a bit more sophisticated, e.g. by checking some condition to be True once a second, instead of sleep 5:
   
   This is a separate issue https://github.com/apache/airflow/issues/8606 , in that case if env variables would be used  for this purpose then we can git rid of the init script entirely. Besides containers share cpu, memory, network and io, I'm not sure adding another container for a one time action is the best way to go.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] stale[bot] commented on pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

stale[bot] commented on pull request #8621:
URL: https://github.com/apache/airflow/pull/8621#issuecomment-646931322


   This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] resdevd edited a comment on pull request #8621: Docker Compose CeleryExecutor example

Posted by GitBox <gi...@apache.org>.

resdevd edited a comment on pull request #8621:
URL: https://github.com/apache/airflow/pull/8621#issuecomment-621684011


   > I think we should decide if the docker-compose file will serve as a "hello world" example, or should it be something more complex, but closer to a production deployment?
   
   IMO the simplest way to run airflow is still pip install with airflow commands. A "hello world" example should not be requiring users to install/learn docker and docker-compose. 
   
   > I think the folder name "templates" is incorrect here, no templating is being done? Something like "examples/docker", or even "docker-compose" in the root of the project makes more sense IMO
   
   Sure, like @turbaszek mentioned we could use a name like "deploy" etc., where more docker-compose config with LocalExecutor and potentially k8s/helm config can be added in future. IMO "examples" would be confusing as it might refer to example dags and operators. 
   
   > There are already plenty of example DAGs, let's not add yet another example
   
   It wasn't about adding an example, the goal was to show usage of docker volumes for dags and plugins.
   
   > I think the goal should be a simple example docker-compose file. People will then extract bits and pieces from the example into their deployment. There's a ton of different ways to do deployment, so I would not aim for creating a docker-compose file with the goal of being close to one single deployment method. Instead, aim for something easy to comprehend.
   
   This PR is addressing https://github.com/apache/airflow/issues/8605 "Add Production-ready docker compose for the production image". 
   IMO CeleryExecutor is probably most advanced config in terms of docker-compose, It's easy to change from CeleryExecutor to LocalExecutor than vice-versa. 
   
   > Okay, why does this change the argument for using the extension fields?
   
   I was referring to the code context that was being referred to regarding extension fields in the original review.
   
   > If we want the docker-compose file to serve as an example, I think it should be as self-explanatory as possible. Storing env vars in a separate file does not help that cause, I think using extension fields and having everything in the same file will be clearer.
   
   Like mentioned above, this PR is addressing an issue where the goal is to have docker-compose close to production.  Having fernet keys, sensitive info that live in the docker-compose is not a best practice either (in terms of .gitignore config and migration to ECS/Fargate, Docker Swarm etc., ). 
   
   
   > > Readability and Adoptation of extension-fields is still lagging behind with majority of docker/docker-compose users.
   
   > 
   > How so?
   
   Glad you asked. Prior to Airflow's official Docker images, the two most popular docker-compose airflow code include https://github.com/puckel/docker-airflow and https://github.com/bitnami/bitnami-docker-airflow . I don't think extension fields are used there. 
   Not only that, I don't think you'd find them being used even in official Docker's repo https://github.com/docker/awesome-compose extensively. You might notice same behavior with some of the most popular docker-compose code like https://github.com/deviantony/docker-elk and https://github.com/vegasbrianc/prometheus etc.,
   
   For the context, Extension fields were introduced in docker-compose 3.4 in 2017. 
   
   > You still have unique container names.
   
   I was referring to the code context that was being referred to regarding extension fields in the original review.
   
   > For the initialization I've used this thing in the past, worked quite okay. We could make it a bit more sophisticated, e.g. by checking some condition to be True once a second, instead of sleep 5:
   
   This is a separate issue https://github.com/apache/airflow/issues/8606 , in that case if env variables would be used  for this purpose then we can git rid of the init script entirely. Besides containers share cpu, memory, network and io, I'm not sure adding another container for a one time action is the best way to go.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org