You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/29 01:08:11 UTC

[GitHub] [airflow] oneturkmen opened a new issue, #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

oneturkmen opened a new issue, #27358:
URL: https://github.com/apache/airflow/issues/27358

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   I have a bash sensor defined as follows:
   
   ```python
   foo_sensor_task = BashSensor(
       task_id="foo_task",
       poke_interval=3600,
       bash_command="python -m foo.run",
       retries=0,
       executor_config={
           "pod_template_file: "path-to-file-yaml",
           "pod_override": k8s.V1Pod(
                 spec=k8s.V1PodSpec(
                     containers=[
                         k8s.V1Container(name="base, image="foo-image", args=["abc"])
                     ]
                 )
           )
       }
   )
   ```
   
   Entrypoint command in the `foo-image` is `python -m foo.run`. However, when I deploy the image onto Openshift (Kubernetes), the command somehow turns out to be the following:
   
   ```bash
   python -m foo.run airflow tasks run foo_dag foo_sensor_task manual__2022-10-28T21:08:39+00:00 ...
   ```
   
   which is wrong.
   
   ### What you think should happen instead
   
   I assume the expected command should override `args` (see V1Container `args` value above) and therefore should be:
   
   ```bash
   python -m foo.run abc
   ```
   
   and **not**:
   ```bash
   python -m foo.run airflow tasks run foo_dag foo_sensor_task manual__2022-10-28T21:08:39+00:00 ...
   ```
   
   
   ### How to reproduce
   
   To reproduce the above issue, create a simple DAG and a sensor as defined above. Use a sample image and try to override the args. I cannot provide the same code due to NDA.
   
   ### Operating System
   
   RHLS 7.9
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==2.4.0
   apache-airflow-providers-cncf-kubernetes==2.1.0
   apache-airflow-providers-ftp==2.0.1
   apache-airflow-providers-http==2.0.1
   apache-airflow-providers-imap==2.0.1
   apache-airflow-providers-mysql==2.1.1
   apache-airflow-providers-sqlite==2.0.1
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   N/A
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jedcunningham commented on issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #27358:
URL: https://github.com/apache/airflow/issues/27358#issuecomment-1299025869

   Hey @oneturkmen!
   
   That is expected behavior, as the worker uses that to know what task to run. May I ask what you are trying to ultimately achieve by overwriting args?
   
   This should have been documented though, so I've opened #27450 to do that. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] oneturkmen commented on issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

Posted by GitBox <gi...@apache.org>.
oneturkmen commented on issue #27358:
URL: https://github.com/apache/airflow/issues/27358#issuecomment-1299376187

   @jedcunningham we wanted to have a BashSensor task that would ping an external service to see if some file is generated or not. If the file isn't there yet, we would keep pinging for some time, and only then if it's still not ready, then we would fail the task.
   
   > That is expected behavior, as the worker uses that to know what task to run.
   
   I did not expect that because we use `KubernetesPodOperator` where we supply our custom image which uses python as the base (i.e., `FROM python3.7` and not `FROM airflow:2.2.2`), and it seems to work as needed. You can see in the code snippets of the KubernetesPodOperator docs [here](https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html#kubernetespodoperator) that we are able to override the image along with the command, which does not have the `airflow tasks run` command appended. Maybe I missing something here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] devdattakulkarni commented on issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

Posted by "devdattakulkarni (via GitHub)" <gi...@apache.org>.
devdattakulkarni commented on issue #27358:
URL: https://github.com/apache/airflow/issues/27358#issuecomment-1569995381

   @jedcunningham Hello. I stumbled upon this issue while debugging an error. I watched the video you mentioned above and could not find an answer so I thought of asking here. I hope it's okay.
   
   So I have a K8s executor with a custom image. I trigger the dag from Airflow UI by passing in custom parameters using the "trigger DAG w/config" option. I understand these parameters will be accessible to the task via the dag_run dictionary. But I am not able to access the dag_run dictionary. Below is a brief snippet of my task definition. 
   `
           my_executor_config = {
               "pod_override": k8s.V1Pod(
                   spec=k8s.V1PodSpec(
                       containers=[
                           k8s.V1Container(
                               name="base",
                               image="custom-image",
                               command=["python3","count_exposure_points.py", {{ dag_run.conf['bucket'] }}, {{ dag_run
   .conf['prefix'] }}]
   
       @task(executor_config=my_executor_config)
       def my_task():
           print_stuff()
      
      my_task
   `
   I tried dereferencing dag_run as "{{ dag_run }}" (with double quotes) -- but this just passes the string "{{ dag_run }}" to the command. Without the double quotes, DAG fails to load with "Nameerror dag_run is an unknown name".
   
   Any pointers/suggestions on how to access dag_run in a KubernetesExecutor will be very helpful.
   
   Thank you!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jedcunningham commented on issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #27358:
URL: https://github.com/apache/airflow/issues/27358#issuecomment-1300686637

   I think you are misunderstanding what KubernetesExecutor is actually doing. KE spins up a Airflow worker pod for every task. In your case, it'll spin up a pod and say "Airflow, run task 'foo_task' for dag 'foo_dag' run_id 'manual__...'" (which matches the args KE sets). That worker then will run your (in this case) bash_command (or do whatever else you've asked it to do).
   
   KPO is a different situation. The conceptual "kubectl create pod" is replacing the bash_command, but it still runs from an Airflow worker.
   
   Short version: You want to put all your task specific logic in bash_command when doing a BashSensor. Bonus, this keeps it portable between executors!
   
   I actually gave a talk that covered this at Airflow Summit this year, it's short so might be worth a watch: https://youtu.be/H8JjhiVGOlg


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jedcunningham commented on issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

Posted by "jedcunningham (via GitHub)" <gi...@apache.org>.
jedcunningham commented on issue #27358:
URL: https://github.com/apache/airflow/issues/27358#issuecomment-1570926986

   Hi @devdattakulkarni,
   
   Generally you should ask this type of stuff on [our slack](https://apache-airflow-slack.herokuapp.com/) or in a [discussion](https://github.com/apache/airflow/discussions) instead of old issues, even if they are sorta related like this one.
   
   Couple things:
   
   - Don't use executor config to run a separate python script like you are trying to. KE still needs to run the Airflow worker. Do the import and run whatever you are trying to do in your `my_task` function, or use BashOperator.
   - If you stick with taskflow, you can access conf with the context, as described in the [taskflow tutorial](https://airflow.apache.org/docs/apache-airflow/stable/tutorial/taskflow.html#accessing-context-variables-in-decorated-tasks). The templating you were attempting doesn't work everywhere, only in attributes listed in `templated_fields` in your operator. See the [jinjia templating section of the operator docs](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/operators.html#concepts-jinja-templating) for details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #27358:
URL: https://github.com/apache/airflow/issues/27358#issuecomment-1295672126

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] oneturkmen commented on issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

Posted by GitBox <gi...@apache.org>.
oneturkmen commented on issue #27358:
URL: https://github.com/apache/airflow/issues/27358#issuecomment-1300724007

   Thanks Jed. That makes much more sense now.
   
   On Wed, Nov 2, 2022, 11:31 Jed Cunningham ***@***.***> wrote:
   
   > I think you are misunderstanding what KubernetesExecutor is actually
   > doing. KE spins up a Airflow worker pod for every task. In your case, it'll
   > spin up a pod and say "Airflow, run task 'foo_task' for dag 'foo_dag'
   > run_id 'manual__...'" (which matches the args KE sets). That worker then
   > will run your (in this case) bash_command (or do whatever else you've asked
   > it to do).
   >
   > KPO is a different situation. The conceptual "kubectl create pod" is
   > replacing the bash_command, but it still runs from an Airflow worker.
   >
   > Short version: You want to put all your task specific logic in
   > bash_command when doing a BashSensor. Bonus, this keeps it portable between
   > executors!
   >
   > I actually gave a talk that covered this at Airflow Summit this year, it's
   > short so might be worth a watch: https://youtu.be/H8JjhiVGOlg
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/airflow/issues/27358#issuecomment-1300686637>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AEJDMLAUFQXP4OLIXJA5RXTWGKCLXANCNFSM6AAAAAARRRH2TI>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] devdattakulkarni commented on issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

Posted by "devdattakulkarni (via GitHub)" <gi...@apache.org>.
devdattakulkarni commented on issue #27358:
URL: https://github.com/apache/airflow/issues/27358#issuecomment-1570991556

   @jedcunningham Ack on using Slack or discussions for asking questions. Next time will do that. 
   
   Thanks a lot for the detailed answers. They are helpful and confirm the solution that I stumbled upon just an hour ago via trial and error.
   
   So now I am creating my custom image by inheriting from the Airflow Worker image. Then in the executor_config, I am not defining any command.  Instead, in my_task, I am using context to get the dag_run parameters and then using Python subprocess to invoke the actual command. This setup is working now. 
   
   Hopefully, this explanation can help someone else who runs into this issue.
   
   Thank you for the quick reply 💯 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] oneturkmen commented on issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

Posted by GitBox <gi...@apache.org>.
oneturkmen commented on issue #27358:
URL: https://github.com/apache/airflow/issues/27358#issuecomment-1295674800

   To follow up on above, I think the issue is perhaps somewhere in these LOCs: https://github.com/apache/airflow/blob/main/airflow/executors/kubernetes_executor.py#L307-L333
   
   But is precisely here: https://github.com/apache/airflow/blob/5df1d6ec20677fee23a21bbbf13a7293d241a2f7/airflow/executors/kubernetes_executor.py#L330
   
   The above LOC always overrides `args` as a command, which is basically `airflow tasks run`. Why do we pass command as an args? Is that expected behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish closed issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container

Posted by GitBox <gi...@apache.org>.
dstandish closed issue #27358: Airflow 2.2.2 pod_override does not override `args` of V1Container
URL: https://github.com/apache/airflow/issues/27358


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org