You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/14 14:33:42 UTC

[GitHub] [airflow] sstoefe opened a new issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

sstoefe opened a new issue #13675:
URL: https://github.com/apache/airflow/issues/13675


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   These questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**: v2.0.0
   **Git Version**: release:2.0.0+ab5f770bfcd8c690cbe4d0825896325aca0beeca
   
   
   **Docker version**: Docker version 20.10.1, build 831ebeae96
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: local setup, docker engine in swarm mode, docker stack deploy
   - **OS** (e.g. from /etc/os-release): Manjaro Linux
   - **Kernel** (e.g. `uname -a`): 5.9.11
   - **Install tools**: 
     - docker airflow image apache/airflow:2.0.0-python3.8 (hash _fe4a64af9553_)
   - **Others**:
   
   **What happened**:
   
   When using `DockerSwarmOperator` (either `contrib` or `providers` module) together with the default `enable_logging=True` option, tasks do not succeed and stay in state `running`. When checking the `docker service logs` I can clearly see that the container ran and ended successfully. Airflow however does not recognize that the container finished and keeps the tasks in state `running`.
   
   However, when using `enable_logging=False` AND `auto_remove=False` containers are recognized as finished and tasks are correctly in state `success`. When using `enable_logging=False` and `auto_remove=True` I get the following error message
   ```
   {taskinstance.py:1396} ERROR - 404 Client Error: Not Found ("service 936om1s4zso10ye5ferhvwnxn not found")
   ```
   
   <!-- (please include exact error messages if you can) -->
   
   **What you expected to happen**:
   
   When I run a DAG with `DockerSwarmOperator`s in it I expect that docker containers are distributed to the docker swarm and that container logs and states are correctly tracked by the DockerSwarmOperator. Meaning, with `enable_logging=True` option I would expect that the TaskInstance's log contains the logging output of the docker container/service. Furthermore, when using the `auto_remove=True` option I would expect that docker services are removed after the TaskInstance is finished successfully.
   
   <!-- What do you think went wrong? -->
   It looks like something is broken with the `enable_logging` and `auto_remove=True` options.
   
   **How to reproduce it**:
   #### **`Dockerfile`**
   ```
   FROM apache/airflow:2.0.0-python3.8
   
   ARG DOCKER_GROUP_ID
   
   USER root
   
   RUN groupadd --gid $DOCKER_GROUP_ID docker \
       && usermod -aG docker airflow
   
   USER airflow
   ```
   
   airflow user needs to be in the docker group to have access to the docker daemon
   
   #### **build the Dockerfile**
   ```
   docker build --build-arg DOCKER_GROUP_ID=$(getent group docker | awk -F: '{print $3}') -t docker-swarm-bug .
   ```
   
   #### **`docker-stack.yml`**
   ```
   version: "3.2"
   networks:
     airflow:
   
   services:
     postgres:
       image: postgres:13.1
       environment:
         - POSTGRES_USER=airflow
         - POSTGRES_DB=airflow
         - POSTGRES_PASSWORD=airflow
         - PGDATA=/var/lib/postgresql/data/pgdata
       ports:
         - 5432:5432
       volumes:
         - /var/run/docker.sock:/var/run/docker.sock
         - ./database/data:/var/lib/postgresql/data/pgdata
         - ./database/logs:/var/lib/postgresql/data/log
       command: >
         postgres
           -c listen_addresses=*
           -c logging_collector=on
           -c log_destination=stderr
           -c max_connections=200
       networks:
         - airflow
     redis:
       image: redis:5.0.5
       environment:
         REDIS_HOST: redis
         REDIS_PORT: 6379
       ports:
         - 6379:6379
       networks:
         - airflow
     webserver:
       env_file:
         - .env
       image: docker-swarm-bug:latest
       ports:
         - 8080:8080
       volumes:
         - ./airflow_files/dags:/opt/airflow/dags
         - ./logs:/opt/airflow/logs
         - ./files:/opt/airflow/files
         - /var/run/docker.sock:/var/run/docker.sock
       deploy:
         restart_policy:
           condition: on-failure
           delay: 8s
           max_attempts: 3
       depends_on:
         - postgres
         - redis
       command: webserver
       healthcheck:
         test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
         interval: 30s
         timeout: 30s
         retries: 3
       networks:
         - airflow
     flower:
       image: docker-swarm-bug:latest
       env_file:
         - .env
       ports:
         - 5555:5555
       depends_on:
         - redis
       deploy:
         restart_policy:
           condition: on-failure
           delay: 8s
           max_attempts: 3
       volumes:
         - ./logs:/opt/airflow/logs
       command: celery flower
       networks:
         - airflow
     scheduler:
       image: docker-swarm-bug:latest
       env_file:
         - .env
       volumes:
         - ./airflow_files/dags:/opt/airflow/dags
         - ./logs:/opt/airflow/logs
         - ./files:/opt/airflow/files
         - /var/run/docker.sock:/var/run/docker.sock
       command: scheduler
       deploy:
         restart_policy:
           condition: on-failure
           delay: 8s
           max_attempts: 3
       networks:
         - airflow
     worker:
       image: docker-swarm-bug:latest
       env_file:
         - .env
       volumes:
         - ./airflow_files/dags:/opt/airflow/dags
         - ./logs:/opt/airflow/logs
         - ./files:/opt/airflow/files
         - /var/run/docker.sock:/var/run/docker.sock
       command: celery worker
       depends_on:
         - scheduler
   
       deploy:
         restart_policy:
           condition: on-failure
           delay: 8s
           max_attempts: 3
       networks:
         - airflow
     initdb:
       image: docker-swarm-bug:latest
       env_file:
         - .env
       volumes:
         - ./airflow_files/dags:/opt/airflow/dags
         - ./logs:/opt/airflow/logs
         - ./files:/opt/airflow/files
         - /var/run/docker.sock:/var/run/docker.sock
       entrypoint: /bin/bash
       deploy:
         restart_policy:
           condition: on-failure
           delay: 8s
           max_attempts: 5
       command: -c "airflow db init && airflow users create --firstname admin --lastname admin --email admin --password admin --username admin --role Admin"
       depends_on:
         - redis
         - postgres
       networks:
         - airflow
   ```
   
   #### **`docker_swarm_bug.py`**
   ```
   from airflow import DAG
   from airflow.operators.bash_operator import BashOperator
   from airflow.providers.docker.operators.docker_swarm import DockerSwarmOperator
   # you can also try DockerSwarmOperator from contrib module, shouldn't make a difference
   # from airflow.contrib.operators.docker_swarm_operator import DockerSwarmOperator
   
   default_args = {
       "owner": "airflow",
       "start_date": "2021-01-14"
   }
   
   with DAG(
       "docker_swarm_bug", default_args=default_args, schedule_interval="@once"
   ) as dag:
       start_op = BashOperator(
           task_id="start_op", bash_command="echo start testing multiple dockers",
       )
   
       docker_swarm = list()
       for i in range(16):
           docker_swarm.append(
               DockerSwarmOperator(
                   task_id=f"docker_swarm_{i}",
                   image="hello-world:latest",
                   force_pull=True,
                   auto_remove=True,
                   api_version="auto",
                   docker_url="unix://var/run/docker.sock",
                   network_mode="bridge",
                   enable_logging=False,
               )
           )
   
       finish_op = BashOperator(
           task_id="finish_op", bash_command="echo finish testing multiple dockers",
       )
   
       start_op >> docker_swarm >> finish_op
   ```
   
   #### **create directories, copy DAG and set permissions**
   ```
   mkdir -p airflow_files/dags
   cp docker_swarm_bug.py airflow_files/dags/
   mkdir logs
   mkdir files
   sudo chown -R 50000 airflow_files logs files
   ```
   uid 50000 is the id of the airflow user inside the docker images
   
   #### **deploy `docker-stack.yml`**
   ```
   docker stack deploy --compose-file docker-stack.yml airflow
   ```
   
   #### **trigger DAG `docker_swarm_bug` in UI**
   
   **Anything else we need to know**:
   
   Problem occurs with the options `enable_logging=True`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] etra commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

Posted by GitBox <gi...@apache.org>.
etra commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-826760023


   Hello, 
   is there a workaround for this bug? I have I believe the same issue: If enable_logging=True dag gets stuck and fails even if the container has completed successfully 
   
   Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #13675:
URL: https://github.com/apache/airflow/issues/13675


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-760234248


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] atutkus removed a comment on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

Posted by GitBox <gi...@apache.org>.
atutkus removed a comment on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-826757942


   Is there a workaround for this bug? 
   I need to see errors from container. 
   Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-981180994


   This issue has been closed because it has not received response from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alexcolpitts96 commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

Posted by GitBox <gi...@apache.org>.
alexcolpitts96 commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-1011244694


   @eladkal Sorry to bump an old issue, but it seems to persist with version release:2.2.3+06c82e17e9d7ff1bf261357e84c6013ccdb3c241
   
   Containers are spawned, complete successfully, are removed, and Airflow does not mark them as completed if enable_logging=True


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michaelfresco commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

Posted by GitBox <gi...@apache.org>.
michaelfresco commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-780218936


   Hi @sstoefe, thanks for reporting this bug. I can confirm that with `enable_logging=False` the dag doesn't get stuck, and indeed finishes. (logging is nice though!) In previous version (e.g. Airflow 1.10.8) the Docker Swarm operator doesn't have this behaviour. (btw. if you compare the [current](https://github.com/apache/airflow/blob/25d68a7a9e0b4481486552ece9e77bcaabfa4de2/airflow/providers/docker/operators/docker_swarm.py) version with the way it [used to be](https://github.com/apache/airflow/blob/3e2a02751cf890b780bc26b40c7cee7f1f4e0bd9/airflow/contrib/operators/docker_swarm_operator.py), you can see a lot of changes wrt. to the way logging is handled.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-968178525


   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sstoefe commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

Posted by GitBox <gi...@apache.org>.
sstoefe commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-778811187


   The `auto_remove` bug is fixed in #13852


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] atutkus commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

Posted by GitBox <gi...@apache.org>.
atutkus commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-826757942


   Is there a workaround for this bug? 
   I need to see errors from container. 
   Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org