You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Tzu-ping Chung <tp...@astronomer.io.INVALID> on 2022/08/29 06:46:01 UTC

Should value passed via XComArg be templated?

Context: https://github.com/apache/airflow/issues/26016 <https://github.com/apache/airflow/issues/26016>

Currently if a value is passed to a downstream via XComArg, e.g.

@task
def get_templated_echo():
    return 'echo "The date is {{ ds }}"'

BashOperator(task_id="echo_task", bash_command=get_templated_echo())

Then this value does not go through Jinja2 templating, i.e. the BashOperator above would print The date is {{ ds }} instead of The date is 2022-08-29. The question here is, is this a conscious design decision, and is there reasoning behind it? Personally I can see arguments from both sides, but some context would be very appreciated.

And, if there wasn’t an explicit decision, should template-rendering an XComArg be supported?

TP

Re: Should value passed via XComArg be templated?

Posted by Jarek Potiuk <ja...@potiuk.com>.
Completing the sentence:  If we think that we need a feature, rather than
implementing it in Jinja, we should improve TaskFlow to add it if we cannot
do it now.

On Mon, Aug 29, 2022 at 9:18 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Agree with Ash,
>
> And I think if we add it, it will detract people from doing the same thing
> in a better way.
>
> I think what I really like about TaskFlow is that it actually does not
> need to use Jinja2 at all.
>
> Jinja2 as I see it, is a super-convoluted way of running (subset of)
> Python via strings. Not only dangerous from a security point of view but
> also has a number of deficiencies (no type validation really possible, no
> good support in IDES, limited set of filters to use and plenty of other
> limitations and problems).
>
> You could not avoid using Jinja2 in "classic" operators which are treated
> as black-box. And this is the only reason IMHO we use JINJA2 at all
> in Airflow.
>
> If you look at what features TaskFlow provides - one of its features is to
> make our code pure-Python, and Jinja2-less. Simply, if you want to
> dynamically derive a value - do it in Python code, not in JINJA2. If we
> think that we need a feature, rather than implementing it in Jinja. For me
> this is quite simple equation:
>
> * Classic Operators = Use Jinja2
> * Task Flow = Do not use Jinja2
>
> The example of Ash is a pure manifestation of that.
>
> J.
>
>
> On Mon, Aug 29, 2022 at 9:01 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> In this example I'd say it shouldn't be templated, as you can do it via
>>
>> @task
>> def get_templated_echo(ds):
>> return f'echo "The date is {ds}"'
>>
>> (This is nothing particular to XcomArg either, just the way Xcom had
>> always worked.)
>>
>> I don't think we can change the current/default behaviour without it
>> likely being a breaking change to someone's workflow.
>>
>> If there's enough desire and a good enough use case we can add a way to
>> make it something you can turn on via one way or another.
>>
>> -ash
>>
>>
>>
>> On 29 August 2022 07:46:01 BST, Tzu-ping Chung <tp...@astronomer.io.INVALID>
>> wrote:
>>>
>>> Context: https://github.com/apache/airflow/issues/26016
>>>
>>> Currently if a value is passed to a downstream via XComArg, e.g.
>>>
>>> @taskdef get_templated_echo():    return 'echo "The date is {{ ds }}"'
>>> BashOperator(task_id="echo_task", bash_command=get_templated_echo())
>>>
>>>
>>> Then this value does not go through Jinja2 templating, i.e. the
>>> BashOperator above would print The date is {{ ds }} instead of The date
>>> is 2022-08-29. The question here is, is this a conscious design
>>> decision, and is there reasoning behind it? Personally I can see arguments
>>> from both sides, but some context would be very appreciated.
>>>
>>> And, if there wasn’t an explicit decision, should template-rendering an
>>> XComArg be supported?
>>>
>>> TP
>>>
>>

Re: Should value passed via XComArg be templated?

Posted by Jarek Potiuk <ja...@potiuk.com>.
100%. Adding more helpers is the way to go.

Also a side comment (but quite relevant I think): I think this should be a
good start of making a real "DAG API". We need to - finally - define which
of the utils/functions/methods are the real API of DAGs that users and
providers should be able to use without the danger of breaking
compatibility and which are the internal API of Airflow.
Currently this is a bit "implicit" and blurry so far (based mostly on
common sense). There are some features (Variables, XCom, Connections,
BaseOperator class etc. that constiute the "API" and those are pretty
clear.

But also there are some utils/helpers already which we assume are something
"usable" by DAG/Operator authors but it is not stated anywhere.

At some point of time we need to have a clear separation between those two.
I think when we add more such helpers deliberately - this is a good
opportunity to make at least an inventory and separate documentation page
about "This is what you as DAG/Operator author can use without the fear of
breaking changes".

J.


On Mon, Aug 29, 2022 at 9:31 AM Tzu-ping Chung <tp...@astronomer.io.invalid>
wrote:

> I agree that templating is not the best solution for the contrived
> example. But there are more concrete use cases for the templating mechanism
> that can be useful, such as templating the content of a file (which is
> arguably a consequence of the much to magical nature of argument templating
> to begin with, of course…).
>
> Perhaps instead of trying to apply templating, we should add more XComArg
> helpers to perform argument pre-processing more explicitly? Some helpers
> are already being added for AIP-42 (to be released in 2.4.0) e.g.
> foo(f1().zip(f2())), and we can add more for reading a file, rendering a
> template, etc.
>
> Example:
>
>
> ## code.sh
> echo {{ ds }}
> ##
>
> ## dag.py
> @task
> def script():
>     # To contrived to make sense, but imagine a task that conditional
>     # returns a script to run.
>     return "code.sh"
>
> # The render_template_file helper reads out the file content and renders
> it as a template.
> BashOperator(task_id="run", bash_command=script().render_template_file())
> ##
>
>
> TP
>
>
> On 29 Aug 2022, at 15:18, Jarek Potiuk <ja...@potiuk.com> wrote:
>
> Agree with Ash,
>
> And I think if we add it, it will detract people from doing the same thing
> in a better way.
>
> I think what I really like about TaskFlow is that it actually does not
> need to use Jinja2 at all.
>
> Jinja2 as I see it, is a super-convoluted way of running (subset of)
> Python via strings. Not only dangerous from a security point of view but
> also has a number of deficiencies (no type validation really possible, no
> good support in IDES, limited set of filters to use and plenty of other
> limitations and problems).
>
> You could not avoid using Jinja2 in "classic" operators which are treated
> as black-box. And this is the only reason IMHO we use JINJA2 at all
> in Airflow.
>
> If you look at what features TaskFlow provides - one of its features is to
> make our code pure-Python, and Jinja2-less. Simply, if you want to
> dynamically derive a value - do it in Python code, not in JINJA2. If we
> think that we need a feature, rather than implementing it in Jinja. For me
> this is quite simple equation:
>
> * Classic Operators = Use Jinja2
> * Task Flow = Do not use Jinja2
>
> The example of Ash is a pure manifestation of that.
>
> J.
>
>
> On Mon, Aug 29, 2022 at 9:01 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> In this example I'd say it shouldn't be templated, as you can do it via
>>
>> @task
>> def get_templated_echo(ds):
>> return f'echo "The date is {ds}"'
>>
>> (This is nothing particular to XcomArg either, just the way Xcom had
>> always worked.)
>>
>> I don't think we can change the current/default behaviour without it
>> likely being a breaking change to someone's workflow.
>>
>> If there's enough desire and a good enough use case we can add a way to
>> make it something you can turn on via one way or another.
>>
>> -ash
>>
>>
>>
>> On 29 August 2022 07:46:01 BST, Tzu-ping Chung <tp...@astronomer.io.INVALID>
>> wrote:
>>>
>>> Context: https://github.com/apache/airflow/issues/26016
>>>
>>> Currently if a value is passed to a downstream via XComArg, e.g.
>>>
>>> @taskdef get_templated_echo():    return 'echo "The date is {{ ds }}"'
>>> BashOperator(task_id="echo_task", bash_command=get_templated_echo())
>>>
>>>
>>> Then this value does not go through Jinja2 templating, i.e. the
>>> BashOperator above would print The date is {{ ds }} instead of The date
>>> is 2022-08-29. The question here is, is this a conscious design
>>> decision, and is there reasoning behind it? Personally I can see arguments
>>> from both sides, but some context would be very appreciated.
>>>
>>> And, if there wasn’t an explicit decision, should template-rendering an
>>> XComArg be supported?
>>>
>>> TP
>>>
>>
>

Re: Should value passed via XComArg be templated?

Posted by Tzu-ping Chung <tp...@astronomer.io.INVALID>.
I agree that templating is not the best solution for the contrived example. But there are more concrete use cases for the templating mechanism that can be useful, such as templating the content of a file (which is arguably a consequence of the much to magical nature of argument templating to begin with, of course…).

Perhaps instead of trying to apply templating, we should add more XComArg helpers to perform argument pre-processing more explicitly? Some helpers are already being added for AIP-42 (to be released in 2.4.0) e.g. foo(f1().zip(f2())), and we can add more for reading a file, rendering a template, etc.

Example:


## code.sh
echo {{ ds }}
##

## dag.py
@task
def script():
    # To contrived to make sense, but imagine a task that conditional
    # returns a script to run.
    return "code.sh"

# The render_template_file helper reads out the file content and renders it as a template.
BashOperator(task_id="run", bash_command=script().render_template_file())
##


TP


> On 29 Aug 2022, at 15:18, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> Agree with Ash,
> 
> And I think if we add it, it will detract people from doing the same thing in a better way.
> 
> I think what I really like about TaskFlow is that it actually does not need to use Jinja2 at all. 
> 
> Jinja2 as I see it, is a super-convoluted way of running (subset of) Python via strings. Not only dangerous from a security point of view but also has a number of deficiencies (no type validation really possible, no good support in IDES, limited set of filters to use and plenty of other limitations and problems).
> 
> You could not avoid using Jinja2 in "classic" operators which are treated as black-box. And this is the only reason IMHO we use JINJA2 at all in Airflow.
> 
> If you look at what features TaskFlow provides - one of its features is to make our code pure-Python, and Jinja2-less. Simply, if you want to dynamically derive a value - do it in Python code, not in JINJA2. If we think that we need a feature, rather than implementing it in Jinja. For me this is quite simple equation:
> 
> * Classic Operators = Use Jinja2
> * Task Flow = Do not use Jinja2
> 
> The example of Ash is a pure manifestation of that.
> 
> J.
> 
> 
> On Mon, Aug 29, 2022 at 9:01 AM Ash Berlin-Taylor <ash@apache.org <ma...@apache.org>> wrote:
> In this example I'd say it shouldn't be templated, as you can do it via
> 
> @task
> def get_templated_echo(ds):
> return f'echo "The date is {ds}"'
> 
> (This is nothing particular to XcomArg either, just the way Xcom had always worked.)
> 
> I don't think we can change the current/default behaviour without it likely being a breaking change to someone's workflow.
> 
> If there's enough desire and a good enough use case we can add a way to make it something you can turn on via one way or another.
> 
> -ash 
> 
> 
> 
> On 29 August 2022 07:46:01 BST, Tzu-ping Chung <tp...@astronomer.io.INVALID> wrote:
> Context: https://github.com/apache/airflow/issues/26016 <https://github.com/apache/airflow/issues/26016>
> 
> Currently if a value is passed to a downstream via XComArg, e.g.
> 
> @task
> def get_templated_echo():
>     return 'echo "The date is {{ ds }}"'
> 
> BashOperator(task_id="echo_task", bash_command=get_templated_echo())
> 
> Then this value does not go through Jinja2 templating, i.e. the BashOperator above would print The date is {{ ds }} instead of The date is 2022-08-29. The question here is, is this a conscious design decision, and is there reasoning behind it? Personally I can see arguments from both sides, but some context would be very appreciated.
> 
> And, if there wasn’t an explicit decision, should template-rendering an XComArg be supported?
> 
> TP


Re: Should value passed via XComArg be templated?

Posted by Jarek Potiuk <ja...@potiuk.com>.
Agree with Ash,

And I think if we add it, it will detract people from doing the same thing
in a better way.

I think what I really like about TaskFlow is that it actually does not need
to use Jinja2 at all.

Jinja2 as I see it, is a super-convoluted way of running (subset of) Python
via strings. Not only dangerous from a security point of view but also has
a number of deficiencies (no type validation really possible, no good
support in IDES, limited set of filters to use and plenty of other
limitations and problems).

You could not avoid using Jinja2 in "classic" operators which are treated
as black-box. And this is the only reason IMHO we use JINJA2 at all
in Airflow.

If you look at what features TaskFlow provides - one of its features is to
make our code pure-Python, and Jinja2-less. Simply, if you want to
dynamically derive a value - do it in Python code, not in JINJA2. If we
think that we need a feature, rather than implementing it in Jinja. For me
this is quite simple equation:

* Classic Operators = Use Jinja2
* Task Flow = Do not use Jinja2

The example of Ash is a pure manifestation of that.

J.


On Mon, Aug 29, 2022 at 9:01 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> In this example I'd say it shouldn't be templated, as you can do it via
>
> @task
> def get_templated_echo(ds):
> return f'echo "The date is {ds}"'
>
> (This is nothing particular to XcomArg either, just the way Xcom had
> always worked.)
>
> I don't think we can change the current/default behaviour without it
> likely being a breaking change to someone's workflow.
>
> If there's enough desire and a good enough use case we can add a way to
> make it something you can turn on via one way or another.
>
> -ash
>
>
>
> On 29 August 2022 07:46:01 BST, Tzu-ping Chung <tp...@astronomer.io.INVALID>
> wrote:
>>
>> Context: https://github.com/apache/airflow/issues/26016
>>
>> Currently if a value is passed to a downstream via XComArg, e.g.
>>
>> @taskdef get_templated_echo():    return 'echo "The date is {{ ds }}"'
>> BashOperator(task_id="echo_task", bash_command=get_templated_echo())
>>
>>
>> Then this value does not go through Jinja2 templating, i.e. the
>> BashOperator above would print The date is {{ ds }} instead of The date
>> is 2022-08-29. The question here is, is this a conscious design
>> decision, and is there reasoning behind it? Personally I can see arguments
>> from both sides, but some context would be very appreciated.
>>
>> And, if there wasn’t an explicit decision, should template-rendering an
>> XComArg be supported?
>>
>> TP
>>
>

Re: Should value passed via XComArg be templated?

Posted by Ash Berlin-Taylor <as...@apache.org>.
In this example I'd say it shouldn't be templated, as you can do it via

@task
def get_templated_echo(ds):
    return f'echo "The date is {ds}"'

(This is nothing particular to XcomArg either, just the way Xcom had always worked.)

I don't think we can change the current/default behaviour without it likely being a breaking change to someone's workflow.

If there's enough desire and a good enough use case we can add a way to make it something you can turn on via one way or another.

-ash 



On 29 August 2022 07:46:01 BST, Tzu-ping Chung <tp...@astronomer.io.INVALID> wrote:
>Context: https://github.com/apache/airflow/issues/26016 <https://github.com/apache/airflow/issues/26016>
>
>Currently if a value is passed to a downstream via XComArg, e.g.
>
>@task
>def get_templated_echo():
>    return 'echo "The date is {{ ds }}"'
>
>BashOperator(task_id="echo_task", bash_command=get_templated_echo())
>
>Then this value does not go through Jinja2 templating, i.e. the BashOperator above would print The date is {{ ds }} instead of The date is 2022-08-29. The question here is, is this a conscious design decision, and is there reasoning behind it? Personally I can see arguments from both sides, but some context would be very appreciated.
>
>And, if there wasn’t an explicit decision, should template-rendering an XComArg be supported?
>
>TP