You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "andreyolv (via GitHub)" <gi...@apache.org> on 2024/02/22 14:42:51 UTC

[I] Policies do not run at the startup of each worker [airflow]

andreyolv opened a new issue, #37621:
URL: https://github.com/apache/airflow/issues/37621

   ### Description
   
   Policies do not run at the startup of each worker, only in dag processor.
   
   ### Use case/motivation
   
   I'm using some policies for dags and tasks and I noticed that the policies are executed continuously by the dagprocessor and ok that makes sense, but it is also executed in the workers when each task starts. (using KubernesExecutor)
   
   Why do they run on workers? Is there any way for policies not to run on workers, by configuration?
   
   We use airflow with each project in its namespace, and if any of the policies need extra permissions in kubernetes, we can just link it to the dag processor. However, if the task is executed in each namespace, we will need to have these permissions in each namespace and this may end up complicating the use of policies in addition to the project image needing specific dependencies to run the policies.
   
   
   ### Related issues
   
   I didn't find
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Policies do not run at the startup of each worker [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #37621:
URL: https://github.com/apache/airflow/issues/37621#issuecomment-1967929973

   Policies need to be run everywhere when task is parsed - because they potentially modify task instance. Ths is intended and the way how it could vary per execution is nicely explained by @SamWheating. Converting to a discussion if needed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Policies do not run at the startup of each worker [airflow]

Posted by "SamWheating (via GitHub)" <gi...@apache.org>.
SamWheating commented on issue #37621:
URL: https://github.com/apache/airflow/issues/37621#issuecomment-1967305008

   From my understanding, the worker process will sometimes (depending on dag serialization, etc) need to reprocess DAG files. This means that the task and dag policies can be reapplied at run-time, since they're applied every time a file is parsed.
   
   > Is there any way for policies not to run on workers, by configuration?
   
   There's definitely some (slightly hacky) ways around this - could you just add some logic to your policy to early exit if it detects that its running on a worker? Some ideas come to mind:
   
   
   1) Check the dag parsing context to see if the policy is being run within a worker process:
   ```python
   from airflow.utils.dag_parsing_context import get_parsing_context
   
   def task_policy(task) -> None:
   
     if get_parsing_context().task_id is not None: # this will only be true at the time of task execution
       return
   
     print("Running the rest of the policy..")
   ```
   
   2) Set a variable in your kubernetes executor pod template file, and then check that in your policy:
   ```python
   import os
   
   def task_policy(task) -> None:
   
     if environ.get('YOUR_VARIABLE_HERE') is not None:
       return
   
     print("Running the rest of the policy..")
   ```
   
   I guess we could add some sort of configuration to automate this, but in my opinion running policies at parse time is expected behaviour and shouldn't introduce additional complexity when running on executors, aside from rare cases like this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Policies do not run at the startup of each worker [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #37621: Policies do not run at the startup of each worker
URL: https://github.com/apache/airflow/issues/37621


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org