You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "potiuk (via GitHub)" <gi...@apache.org> on 2023/09/08 08:00:12 UTC

[GitHub] [airflow] potiuk commented on pull request #34198: Add instructions for dynamically extending operators

potiuk commented on PR #34198:
URL: https://github.com/apache/airflow/pull/34198#issuecomment-1711245962

   > I’m not sure I’d want my colleagues to do this. The standard way to extending a class is to subclass, not monkey patching something on the class.
   
   Actually I think this was a discovery we made recently, that this is possible and our users CAN do it this way. And I believe `many` users of ours find this as much easier way of changing definition of templated fields rather than extending classes (and it can be extended to other class fields without side-effect or dangers - I think).
   
   I personally find it as a very smart, simple and perfectly fine way of changing template fields. But maybe there are some drawbacks/side-effects I could not see.
   
   I believe it should have no side effect and should be a very straightforward way of changing the set of templated_fields and template_ext especially (that was always a pain when someone **just** missed one field in those or where some operator had a `template_ext`  configured to be `.json` but one of your fields had to have field that was ending with .json (see https://github.com/apache/airflow/issues/33694). 
   
   The "Literally" idea in #33694 @uranusjr  idea is cool to solve "template_ext" problem, and we might implement it to solve it  problem in 2.8.0. But this one has a solution for pre-2.8.0 for that.
   
   However, many users have been struggling with fields in template_ext, and we even toyed with the possibility of making all fields templated (which is not possible for back-compatibility reasons) - and we've always suggested to create custom classes for that.
   
   But the need to define a new class for `template_ext` is rather cumbersome for this case and has some ripple effects. 
   
   Compare:
   
   ``
   Operator.template_ext = (*Operator.template_ext, "another_field")
   ``
   
   vs.
   
   ```
   class MyOperator(Operator):
       template.ext = (*Operator.template_ext, "another_field")
   ```
   
   + (and this is the difficult part) you need to change to use MyOperator in all your DAGs. 
   
   Defining your own 'MyOperatora` class has also the drawback, that if you have some common utilities imported that create DAGs or parts of your DAGs, you will actually need to move the "MyOperator" to those common utiltities to use it and then if you would like to change the `template_ext` only in one or two of your DAGs, then it becomes very very messy.
   
   The monkeypatching solution is IMHO much cleaner, more elegant - and avoids the "common utils" mess - because we **guarantee** all DAGs are parsed in isolated subprocess, the scope of monkeypatching of the Operator is limited to single dag file. Full stop. It does not leak anywhere, there are no side-effects. Once the DAG is serialized, the operator's fields are marked as templated according to monkeypatched list and the webserver and all other components will just use the serialized version. I cannot find how this could cause other side effect.
   
   I think it's quite cool to suggest it to the users that they can do it (for sure for `template_ext` and `template_fields`). And it can be extended to other class fields as well (as suggested here) - the monkeypatching effect is really limited to "one dag file" and this is guaranteed by the way how DAGs are parsed and how tasks are executed and how all the other components use just serialized version of the DAG. 
   
   For now I don't see how this might be problematic, and it seems like an elegant way to solve quite a number of pain points our users have in a very elegant way. But I would love to hear what others think. Adding the usual suspects for these discussions (maybe I missed someone):
   
   @jedcunningham @dstandish @Taragolis @kaxil @pankajastro @o-nikolas @hussein-awala @eladkal @ephraimbuddy 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org