You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/26 12:18:57 UTC

[GitHub] [airflow] sanje2v opened a new issue #21832: Unmarshalling of function with '@task.docker()' decorator fails if 'python' alias is not defined in image

sanje2v opened a new issue #21832:
URL: https://github.com/apache/airflow/issues/21832


   ### Apache Airflow version
   
   2.2.4 (latest released)
   
   ### What happened
   
   I am using Airflow 2.2.4 docker which is to run a DAG, `test_dag.py`, defined as follows:
   
   ```
   from airflow.decorators import dag, task
   from airflow.utils import dates
   
   
   @dag(schedule_interval=None,
        start_date=dates.days_ago(1),
        catchup=False)
   def test_dag():
   
       @task.docker(image='company/my-repo',
                    api_version='auto',
                    docker_url='tcp://docker-socket-proxy:2375/',
                    auto_remove=True)
       def docker_task(inp):
           print(inp)
           return inp+1
   
       @task.python()
       def python_task(inp):
           print(inp)
   
       out = docker_task(10)
       python_task(out)
   
   
   _ = test_dag()
   ```
   
   The Dockerfile for 'company/my-repo' is as follows:
   
   ```
   FROM nvidia/cuda:11.2.2-runtime-ubuntu20.04
   
   USER root
   ARG DEBIAN_FRONTEND=noninteractive
   
   RUN apt-get update && apt-get install -y python3 python3-pip
   ```
   
   ### What you expected to happen
   
   I expected the DAG logs for `docker_task()` and `python_task()` to have 10 and 11 as output respectively.
   
   Instead, the internal Airflow unmarshaller that is supposed to unpickle the function definition of `docker_task()` inside the container of image `company/my-repo` via `__PYTHON_SCRIPT` environmental variable to run it, makes an **incorrect assumption** that the symbol `python` is defined as an alias for either `/usr/bin/python2` or `/usr/bin/python3`. Most linux python installations require that users explicitly specify either `python2` or `python3` when running their scripts and `python` is NOT defined even when `python3` is installed via aptitude package manager.
   
   This error can be resolved for now by adding the following to `Dockerfile` after python3 package installation:
   `RUN apt-get install -y python-is-python3`
   
   But this should NOT be a requirement.
   
   `Dockerfile`s using base python images do not suffer from this problem as they have the alias `python` defined.
   
   
   The error logged is:
   ```
   [2022-02-26, 11:30:47 UTC] {docker.py:258} INFO - Starting docker container from image company/my-repo
   [2022-02-26, 11:30:48 UTC] {docker.py:320} INFO - + python -c 'import base64, os;x = base64.b64decode(os.environ["__PYTHON_SCRIPT"]);f = open("/tmp/script.py", "wb"); f.write(x);'
   [2022-02-26, 11:30:48 UTC] {docker.py:320} INFO - bash: python: command not found
   [2022-02-26, 11:30:48 UTC] {taskinstance.py:1700} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1329, in _run_raw_task
       self._execute_task_with_callbacks(context)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1455, in _execute_task_with_callbacks
       result = self._execute_task(context, self.task)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1511, in _execute_task
       result = execute_callable(context=context)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/docker/decorators/docker.py", line 117, in execute
       return super().execute(context)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/decorators/base.py", line 134, in execute
       return_value = super().execute(context)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 390, in execute
       return self._run_image()
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 265, in _run_image
       return self._run_image_with_mounts(self.mounts + [tmp_mount], add_tmp_variable=True)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 324, in _run_image_with_mounts
       raise AirflowException('docker container failed: ' + repr(result) + f"lines {res_lines}")
   airflow.exceptions.AirflowException: docker container failed: {'Error': None, 'StatusCode': 127}lines + python -c 'import base64, os;x = base64.b64decode(os.environ["__PYTHON_SCRIPT"]);f = open("/tmp/script.py", "wb"); f.write(x);'
   bash: python: command not found
   ```
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   Ubuntu 20.04 WSL 2
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #21832: Unmarshalling of function with '@task.docker()' decorator fails if 'python' alias is not defined in image

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #21832:
URL: https://github.com/apache/airflow/issues/21832


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21832: Unmarshalling of function with '@task.docker()' decorator fails if 'python' alias is not defined in image

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21832:
URL: https://github.com/apache/airflow/issues/21832#issuecomment-1053526415


   > I can add this as a parameter and leave default to `python3`
   
   Yeah. Indeed. It's rather simple..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] subkanthi commented on issue #21832: Unmarshalling of function with '@task.docker()' decorator fails if 'python' alias is not defined in image

Posted by GitBox <gi...@apache.org>.
subkanthi commented on issue #21832:
URL: https://github.com/apache/airflow/issues/21832#issuecomment-1053243805


   I can add this as a parameter and leave default to `python3`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #21832: Unmarshalling of function with '@task.docker()' decorator fails if 'python' alias is not defined in image

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #21832:
URL: https://github.com/apache/airflow/issues/21832#issuecomment-1052102983


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #21832: Unmarshalling of function with '@task.docker()' decorator fails if 'python' alias is not defined in image

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #21832:
URL: https://github.com/apache/airflow/issues/21832#issuecomment-1052328755


   Maybe we should add an argument `python` that allows customising the command to use.
   
   Also, considering [PEP 394](https://www.python.org/dev/peps/pep-0394/), perhaps we should default to use `python3` instead of `python`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21832: Unmarshalling of function with '@task.docker()' decorator fails if 'python' alias is not defined in image

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21832:
URL: https://github.com/apache/airflow/issues/21832#issuecomment-1052354653


   I am for using `python3`. I've alraady changed this in some of our scripts, and it seems very reasonable approach. I don't think there is ever the case where python3 would not be available if python is installed and adding configuration will complicate stuff. It's better to make "python3" on the path as "hard requirement" for task.docker image.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org