You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/11 03:32:28 UTC

[GitHub] [airflow] Chaho12 opened a new issue #20797: Python Operator should pass keyword arguments to callbacks for sake of convenience

Chaho12 opened a new issue #20797:
URL: https://github.com/apache/airflow/issues/20797


   ### Description
   
   [Python Operator Execute](https://github.com/apache/airflow/blob/2.2.3/airflow/operators/python.py#L168-L176) method updates context like code below, which works as description mentions : `:param op_kwargs: a dictionary of keyword arguments that will get unpacked in your function`.
   ```python
       def execute(self, context: Dict):
           context.update(self.op_kwargs)
           context['templates_dict'] = self.templates_dict
   
           self.op_kwargs = determine_kwargs(self.python_callable, self.op_args, context)
   
           return_value = self.execute_callable()
           self.log.info("Done. Returned value was: %s", return_value)
           return return_value
    ```
   
   However, I think that this description is ironic to current situation where `_run_finished_callback` uses context and does not contain such additional kwargs set via `op_kwargs`.
   https://github.com/apache/airflow/blob/2.2.3/airflow/models/taskinstance.py#L1562-L1594
   
   ### Use case/motivation
   
   Because python operator is based on base operator, user can and frequently add callbacks such as `on_retry_callback or  on_success_callback` for user's sake. However, callback calls [template context](https://github.com/apache/airflow/blob/2.2.3/airflow/models/taskinstance.py#L1572) which does not contain kwargs set when using python operator, but only [context](https://github.com/apache/airflow/blob/2.2.3/airflow/models/taskinstance.py#L1777). 
   
   I find this not logical because description implies that options set via op_kwargs is applied to both python_callable and callbacks. At airflow 1, there was at least the deprecated `provide_context` parameter option where user can pass in an additional set of keyword arguments: one for each of the Jinja template variables and a templates_dict argument. Also, there was description that explicitly say that op_args and op_kwargs is meant only for python callable. `Use the op_args and op_kwargs arguments to pass additional arguments to the Python callable.`
   
   Therefore, I suggest 1 of 2 suggestions.
   1. Clarify description so that op_kwargs is only applied to `python_callable` parameter
        - user has to make custom functions to add custom args to context and send to callbacks.
   2. callbacks consider op_kwargs options such as [determine_kwargs](https://github.com/apache/airflow/blob/2.2.3/airflow/utils/operator_helpers.py#L91-L118) function.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #20797: Python Operator should pass keyword arguments to callbacks for sake of convenience

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #20797:
URL: https://github.com/apache/airflow/issues/20797#issuecomment-1009851004


   I don't see anything ironic in it when I consider the architectire of Airflow and the way it works - especially in order to achieve resilency and robustness.
   
   I believe that the main reason is that we cannot know sometimes what were the args. When they are dynamically passed (and potentially calculated via jinja templates when PythonOperator gets execute() method called. But for example, when Python operators fail because it is forcefully killed or forcefully (via UI) set to "success" state, the callbacks will be executed in the DagProcessor/Scheduler - not in the worker/task where the execute() method has been called. I believe at that point the op_kwars and  op_args are not (easily) available. I am not sure if those args can be easily passed in this case.
   
   If there are others who confirm that - by all means please make PR to update the documentation menioning that. Airlfow has almost 1900 contributors and documentation from the users who - like you might have different assumptions or lack the broader context of how Airflow works under the hood. And such users are the best to find and document those assumptions and choices in the place that will be most likely best for other users with similar needs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20797: Python Operator should pass keyword arguments to callbacks for sake of convenience

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20797:
URL: https://github.com/apache/airflow/issues/20797#issuecomment-1009851004


   I don't see anything ironic in it when I consider the architectire of Airflow and the way it works - especially in order to achieve resilency and robustness.
   
   I believe that the main reason is that we cannot know sometimes what were the args. When they are dynamically passed (and potentially calculated via jinja templates when PythonOperator gets execute() method called.
   
   For example, when Python operators fail because it is forcefully killed or forcefully (via UI) set to "success" state, the callbacks will be executed in the DagProcessor/Scheduler - not in the worker/task where the execute() method has been called. I believe at that point the op_kwars and  op_args are not (easily) available. I am not sure if those args can be easily passed in this case.
   
   If there are others who confirm that - by all means please make PR to update the documentation menioning that. Airlfow has almost 1900 contributors and documentation from the users who - like you might have different assumptions or lack the broader context of how Airflow works under the hood. And such users are the best to find and document those assumptions and choices in the place that will be most likely best for other users with similar needs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20797: Python Operator should pass keyword arguments to callbacks for sake of convenience

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20797:
URL: https://github.com/apache/airflow/issues/20797#issuecomment-1019371541


   I recommend reading Airflow code simply. It's the best way to learn how it works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Chaho12 commented on issue #20797: Python Operator should pass keyword arguments to callbacks for sake of convenience

Posted by GitBox <gi...@apache.org>.
Chaho12 commented on issue #20797:
URL: https://github.com/apache/airflow/issues/20797#issuecomment-1010588028


   Thank you for your reply. Can you recommend any books/references that I could start to learn more about the architecture ? I did some quick google search and found "The Complete Guide to Apache Airflow 2021", "Data Pipelines with Apache Airflow"
   
   When you say `forcefully (via UI) set to "success" state, the callbacks will be executed in the DagProcessor/Scheduler`, is this the case where the operator has started (state is at running, scheduled etc) and user forcefully changes state via UI?
   I've tried changing the state from `no_status` to  `success` (while it was waiting for other task to finish) and on_success_callback did not seem to be called.
   - Does defining my callable function within the same py file and calling another callable method at different py file have any effects to such case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org