You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/15 17:19:12 UTC

[GitHub] [airflow] edg956 opened a new issue #14809: DockerOperator downloads all tags found

edg956 opened a new issue #14809:
URL: https://github.com/apache/airflow/issues/14809


   **Apache Airflow version**:
   Containerized airflow version 2.0.1
   
   **Environment**:
   Running on WSL2 the following distro:
   ```
   > cat /etc/os-release
   NAME="Ubuntu"
   VERSION="20.04.1 LTS (Focal Fossa)"
   ID=ubuntu
   ID_LIKE=debian
   PRETTY_NAME="Ubuntu 20.04.1 LTS"
   VERSION_ID="20.04"
   
   > uname -a
   Linux pinzas 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
   ```
   
   **What happened**:
   
   After trying to find a way to make my task download docker images, I found this issue #13905 and applied what I saw there to declare a task using a DockerOperator with the following parameters:
   ```
   t3 = DockerOperator(
           task_id="dockerized_step",
           image="ubuntu",
           api_version="auto",
           auto_remove=True,
           command="sleep 5",
           network_mode="bridge",
           force_pull=True,
           dag=dag
   )
   ```
   And then ran the DAG that executed that task.
   
   **What you expected to happen**:
   I expected this step of the DAG to pull `ubuntu:latest`, as it's stated to be the default behaviour, execute the command indicated in the DockerOperator call and finish succesfully.
   
   **How to reproduce it**:
   
   - Get hands on the docker-compose from airflow's documentation
   - Create a Dockerfile containing the following directives
   
   ```
   FROM apache/airflow:2.0.1
   
   RUN python -m pip install apache-airflow-providers-docker
   ```
   
   - Modify the compose file to build from the Dockerfile you just created
   - Modify the compose file to mount the docker socket
   - Make sure the docker sock is reachable by airflow*
   - Add a DAG file containing at least the task in question
   - docker-compose up
   - Go to the UI and run the DAG
   - Check the logs for the task
   
   \* In my case, I also added an entrypoint which finds out the GID of the socket, creates a group and adds the user airflow to the group. Then I run airflow as airflow. I'm sure there are better ways, but so far this is the only way I could find.
   
   After a long time waiting I noticed that the task started some sort of loop downloading all of the tags in ubuntu's docker hub repository. Here are the - shortened - logs and the output of `docker images`.
   
   * logs:
   ```
   [2021-03-15 16:37:16,788] {docker.py:303} INFO - bionic-20200630: Pulling from library/ubuntu
   [2021-03-15 16:37:17,306] {docker.py:303} INFO - a1125296b23d: Pulling fs layer
   [2021-03-15 16:37:17,306] {docker.py:303} INFO - 3c742a4a0f38: Pulling fs layer
   ...
   [2021-03-15 16:37:20,620] {docker.py:298} INFO - Digest: sha256:e5b0b89c846690afe2ce325ac6c6bc3d686219cfa82166fc75c812c1011f0803
   [2021-03-15 16:37:42,678] {docker.py:303} INFO - bionic-20201119: Pulling from library/ubuntu
   ...
   [2021-03-15 16:37:46,171] {docker.py:298} INFO - Digest: sha256:fd25e706f3dea2a5ff705dbc3353cf37f08307798f3e360a13e9385840f73fb3
   [2021-03-15 16:38:24,340] {docker.py:303} INFO - cosmic-20181114: Pulling from library/ubuntu
   ...
   [2021-03-15 16:38:28,497] {docker.py:298} INFO - Digest: sha256:20b5d52b03712e2ba8819eb53be07612c67bb87560f121cc195af27208da10e0
   [2021-03-15 16:39:11,896] {docker.py:303} INFO - devel: Pulling from library/ubuntu
   [2021-03-15 16:39:12,282] {docker.py:298} INFO - Digest: sha256:2fc51f401cb873bfec33022d065efacbaf868b2e23f4dd76d7230d129258e255
   [2021-03-15 16:40:20,732] {docker.py:303} INFO - disco-20191011: Pulling from library/ubuntu
   ...
   [2021-03-15 16:40:24,706] {docker.py:298} INFO - Digest: sha256:59276de55c6aa123d06071a531b7e11c0c7e98e6a7c2f3c87c9789a513e4cd00
   [2021-03-15 16:40:25,755] {docker.py:303} INFO - disco-20191030: Pulling from library/ubuntu
   ...
   [2021-03-15 16:40:29,732] {docker.py:298} INFO - Digest: sha256:994afd4700257cf708b1a8ded7b94d70326a814bc95a6f486247a8790d7c5a70
   [2021-03-15 16:40:30,764] {docker.py:303} INFO - disco-20191127: Pulling from library/ubuntu
   ...
   [2021-03-15 16:40:34,948] {docker.py:298} INFO - Digest: sha256:60b619a302da327ddf40f8cd807f2a8aaccf842658010bc1b01da5c164ce59fa
   [2021-03-15 16:40:35,517] {docker.py:303} INFO - disco-20200114: Pulling from library/ubuntu
   [2021-03-15 16:40:35,987] {docker.py:298} INFO - Digest: sha256:2adeae829bf27a3399a0e7db8ae38d5adb89bcaf1bbef378240bc0e6724e8344
   [2021-03-15 16:40:36,511] {docker.py:303} INFO - disco: Pulling from library/ubuntu
   [2021-03-15 16:40:37,003] {docker.py:298} INFO - Digest: sha256:2adeae829bf27a3399a0e7db8ae38d5adb89bcaf1bbef378240bc0e6724e8344
   [2021-03-15 16:41:35,423] {docker.py:303} INFO - eoan-20200313: Pulling from library/ubuntu
   ...
   [2021-03-15 16:41:39,991] {docker.py:298} INFO - Digest: sha256:acad929ffeda349d0e8c311baf841cc5251d228db7fae4b3f43e54bddbb743de
   [2021-03-15 16:41:41,084] {docker.py:303} INFO - eoan-20200410: Pulling from library/ubuntu
   ...
   [2021-03-15 16:41:44,899] {docker.py:298} INFO - Digest: sha256:6859e3f7b06393b0f3e3a7af893ee882e40582fdce2d7e23d504ba191fada7fd
   [2021-03-15 16:41:48,113] {docker.py:303} INFO - focal-20191030: Pulling from library/ubuntu
   ...
   [2021-03-15 16:41:54,668] {docker.py:298} INFO - Digest: sha256:e8e70528bbd44c76610e3b093a0bcaa9d83d6eaa088f3106da368999ce880fa1
   ```
   For demonstrational purposes I've removed all intermediate tags between major os version
   
   - Output of `docker images`
   You can follow this [pastebin link](https://pastebin.com/ECUWDJzF) to see it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #14809: DockerOperator downloads all tags found

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #14809:
URL: https://github.com/apache/airflow/issues/14809#issuecomment-799597061


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org