You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/14 14:33:42 UTC
[GitHub] [airflow] sstoefe opened a new issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
sstoefe opened a new issue #13675:
URL: https://github.com/apache/airflow/issues/13675
<!--
Welcome to Apache Airflow! For a smooth issue process, try to answer the following questions.
Don't worry if they're not all applicable; just try to include what you can :-)
If you need to include code snippets or logs, please put them in fenced code
blocks. If they're super-long, please use the details tag like
<details><summary>super-long log</summary> lots of stuff </details>
Please delete these comment blocks before submitting the issue.
-->
<!--
IMPORTANT!!!
PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
Please complete the next sections or the issue will be closed.
These questions are the first thing we need to know to understand the context.
-->
**Apache Airflow version**: v2.0.0
**Git Version**: release:2.0.0+ab5f770bfcd8c690cbe4d0825896325aca0beeca
**Docker version**: Docker version 20.10.1, build 831ebeae96
**Environment**:
- **Cloud provider or hardware configuration**: local setup, docker engine in swarm mode, docker stack deploy
- **OS** (e.g. from /etc/os-release): Manjaro Linux
- **Kernel** (e.g. `uname -a`): 5.9.11
- **Install tools**:
- docker airflow image apache/airflow:2.0.0-python3.8 (hash _fe4a64af9553_)
- **Others**:
**What happened**:
When using `DockerSwarmOperator` (either `contrib` or `providers` module) together with the default `enable_logging=True` option, tasks do not succeed and stay in state `running`. When checking the `docker service logs` I can clearly see that the container ran and ended successfully. Airflow however does not recognize that the container finished and keeps the tasks in state `running`.
However, when using `enable_logging=False` AND `auto_remove=False` containers are recognized as finished and tasks are correctly in state `success`. When using `enable_logging=False` and `auto_remove=True` I get the following error message
```
{taskinstance.py:1396} ERROR - 404 Client Error: Not Found ("service 936om1s4zso10ye5ferhvwnxn not found")
```
<!-- (please include exact error messages if you can) -->
**What you expected to happen**:
When I run a DAG with `DockerSwarmOperator`s in it I expect that docker containers are distributed to the docker swarm and that container logs and states are correctly tracked by the DockerSwarmOperator. Meaning, with `enable_logging=True` option I would expect that the TaskInstance's log contains the logging output of the docker container/service. Furthermore, when using the `auto_remove=True` option I would expect that docker services are removed after the TaskInstance is finished successfully.
<!-- What do you think went wrong? -->
It looks like something is broken with the `enable_logging` and `auto_remove=True` options.
**How to reproduce it**:
#### **`Dockerfile`**
```
FROM apache/airflow:2.0.0-python3.8
ARG DOCKER_GROUP_ID
USER root
RUN groupadd --gid $DOCKER_GROUP_ID docker \
&& usermod -aG docker airflow
USER airflow
```
airflow user needs to be in the docker group to have access to the docker daemon
#### **build the Dockerfile**
```
docker build --build-arg DOCKER_GROUP_ID=$(getent group docker | awk -F: '{print $3}') -t docker-swarm-bug .
```
#### **`docker-stack.yml`**
```
version: "3.2"
networks:
airflow:
services:
postgres:
image: postgres:13.1
environment:
- POSTGRES_USER=airflow
- POSTGRES_DB=airflow
- POSTGRES_PASSWORD=airflow
- PGDATA=/var/lib/postgresql/data/pgdata
ports:
- 5432:5432
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./database/data:/var/lib/postgresql/data/pgdata
- ./database/logs:/var/lib/postgresql/data/log
command: >
postgres
-c listen_addresses=*
-c logging_collector=on
-c log_destination=stderr
-c max_connections=200
networks:
- airflow
redis:
image: redis:5.0.5
environment:
REDIS_HOST: redis
REDIS_PORT: 6379
ports:
- 6379:6379
networks:
- airflow
webserver:
env_file:
- .env
image: docker-swarm-bug:latest
ports:
- 8080:8080
volumes:
- ./airflow_files/dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./files:/opt/airflow/files
- /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
depends_on:
- postgres
- redis
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
networks:
- airflow
flower:
image: docker-swarm-bug:latest
env_file:
- .env
ports:
- 5555:5555
depends_on:
- redis
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
volumes:
- ./logs:/opt/airflow/logs
command: celery flower
networks:
- airflow
scheduler:
image: docker-swarm-bug:latest
env_file:
- .env
volumes:
- ./airflow_files/dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./files:/opt/airflow/files
- /var/run/docker.sock:/var/run/docker.sock
command: scheduler
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
networks:
- airflow
worker:
image: docker-swarm-bug:latest
env_file:
- .env
volumes:
- ./airflow_files/dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./files:/opt/airflow/files
- /var/run/docker.sock:/var/run/docker.sock
command: celery worker
depends_on:
- scheduler
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
networks:
- airflow
initdb:
image: docker-swarm-bug:latest
env_file:
- .env
volumes:
- ./airflow_files/dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./files:/opt/airflow/files
- /var/run/docker.sock:/var/run/docker.sock
entrypoint: /bin/bash
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 5
command: -c "airflow db init && airflow users create --firstname admin --lastname admin --email admin --password admin --username admin --role Admin"
depends_on:
- redis
- postgres
networks:
- airflow
```
#### **`docker_swarm_bug.py`**
```
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.providers.docker.operators.docker_swarm import DockerSwarmOperator
# you can also try DockerSwarmOperator from contrib module, shouldn't make a difference
# from airflow.contrib.operators.docker_swarm_operator import DockerSwarmOperator
default_args = {
"owner": "airflow",
"start_date": "2021-01-14"
}
with DAG(
"docker_swarm_bug", default_args=default_args, schedule_interval="@once"
) as dag:
start_op = BashOperator(
task_id="start_op", bash_command="echo start testing multiple dockers",
)
docker_swarm = list()
for i in range(16):
docker_swarm.append(
DockerSwarmOperator(
task_id=f"docker_swarm_{i}",
image="hello-world:latest",
force_pull=True,
auto_remove=True,
api_version="auto",
docker_url="unix://var/run/docker.sock",
network_mode="bridge",
enable_logging=False,
)
)
finish_op = BashOperator(
task_id="finish_op", bash_command="echo finish testing multiple dockers",
)
start_op >> docker_swarm >> finish_op
```
#### **create directories, copy DAG and set permissions**
```
mkdir -p airflow_files/dags
cp docker_swarm_bug.py airflow_files/dags/
mkdir logs
mkdir files
sudo chown -R 50000 airflow_files logs files
```
uid 50000 is the id of the airflow user inside the docker images
#### **deploy `docker-stack.yml`**
```
docker stack deploy --compose-file docker-stack.yml airflow
```
#### **trigger DAG `docker_swarm_bug` in UI**
**Anything else we need to know**:
Problem occurs with the options `enable_logging=True`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] etra commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
Posted by GitBox <gi...@apache.org>.
etra commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-826760023
Hello,
is there a workaround for this bug? I have I believe the same issue: If enable_logging=True dag gets stuck and fails even if the container has completed successfully
Thank you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] closed issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #13675:
URL: https://github.com/apache/airflow/issues/13675
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-760234248
Thanks for opening your first issue here! Be sure to follow the issue template!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] atutkus removed a comment on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
Posted by GitBox <gi...@apache.org>.
atutkus removed a comment on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-826757942
Is there a workaround for this bug?
I need to see errors from container.
Thank you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-981180994
This issue has been closed because it has not received response from the issue author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] alexcolpitts96 commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
Posted by GitBox <gi...@apache.org>.
alexcolpitts96 commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-1011244694
@eladkal Sorry to bump an old issue, but it seems to persist with version release:2.2.3+06c82e17e9d7ff1bf261357e84c6013ccdb3c241
Containers are spawned, complete successfully, are removed, and Airflow does not mark them as completed if enable_logging=True
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] michaelfresco commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
Posted by GitBox <gi...@apache.org>.
michaelfresco commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-780218936
Hi @sstoefe, thanks for reporting this bug. I can confirm that with `enable_logging=False` the dag doesn't get stuck, and indeed finishes. (logging is nice though!) In previous version (e.g. Airflow 1.10.8) the Docker Swarm operator doesn't have this behaviour. (btw. if you compare the [current](https://github.com/apache/airflow/blob/25d68a7a9e0b4481486552ece9e77bcaabfa4de2/airflow/providers/docker/operators/docker_swarm.py) version with the way it [used to be](https://github.com/apache/airflow/blob/3e2a02751cf890b780bc26b40c7cee7f1f4e0bd9/airflow/contrib/operators/docker_swarm_operator.py), you can see a lot of changes wrt. to the way logging is handled.)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-968178525
This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] sstoefe commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
Posted by GitBox <gi...@apache.org>.
sstoefe commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-778811187
The `auto_remove` bug is fixed in #13852
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] atutkus commented on issue #13675: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
Posted by GitBox <gi...@apache.org>.
atutkus commented on issue #13675:
URL: https://github.com/apache/airflow/issues/13675#issuecomment-826757942
Is there a workaround for this bug?
I need to see errors from container.
Thank you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org