You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Daniel Standish <dp...@gmail.com> on 2021/11/10 18:13:44 UTC

[DISCUSS] non-json-serializable params

Prior to 2.2.0, you could use non-json-serializable params in a dag. Here's
an example with `set`:

@dag.task(params={"a": {1, 2, 3}, "b": [3, 4, 5]})
def set_param(intersection):
    print(intersection)


set_param("{{ params.a.intersection(params.b).pop() }}")

In 2.2.0 this was broken, and in 2.2.2rc1 it *should be* fixed
<https://github.com/apache/airflow/pull/19267>.

There's a problem though.  An important part of params is the ability to
override them when triggering a dag from the web UI or CLI.  But params
supplied through either mechanism should be json-serializable.

So on one hand we allow arbitrary params, and on the other hand we require
them to be json serializable.  We can see how this causes a problem in the
above example: if you provide `a` as a `list` in the UI, it won't have the
method `intersection`.

So the question: should we deprecate non-json-serializable params?

My feeling is yes.  But I'm not sure how widely this flexibility is used in
the wild.

Re: [DISCUSS] non-json-serializable params

Posted by Kaxil Naik <ka...@gmail.com>.
Yup, sounds good

On Thu, Nov 11, 2021 at 10:57 PM Daniel Standish <dp...@gmail.com>
wrote:

> OK so we have a consensus of 3.  Should I create a voting thread for
> this?  Namely, formally deprecating non-json-serializable params, for
> removal in 3.0?
>
> On Thu, Nov 11, 2021 at 2:40 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Indeed - if we want to really "deprecate" (and drop in 3.0) support
>> for non-serializable params, then JSON serialization is the way to go.
>>
>> The only "benefit" of using YAML is "set" support but if we are going
>> to "deprecate" non-serializable params, then we can easily include
>> "set" in being 'non-serializable" and use JSON. There is a good reason
>> why JSON does not support sets (because in serialized form it is
>> exactly the same as list - there is no difference, really)
>>
>> J.
>>
>> On Thu, Nov 11, 2021 at 9:24 PM Kaxil Naik <ka...@gmail.com> wrote:
>> >
>> > -1 for breaking it again. We should go ahead with a deprecation route.
>> JSON serializable makes sense, I am not fully convinced if YAML
>> serializable is any better !
>> >
>> > Another note is the current params which Daniel fixed now use
>> Serialization from our DAG Serialization -
>> https://github.com/apache/airflow/blob/7622f5e08261afe5ab50a08a6ca0804af8c7c7fe/airflow/serialization/serialized_objects.py#L289-L330
>> so it currently supports Timedelta, Timezone, Datetime, Tuple etc objects.
>> >
>> > But I agree with Daniel that we should deprecate and only support JSON
>> Serializable objects to make it fully featured like overriding it via CLI,
>> API and Webserver.
>> >
>> > Regards,
>> > Kaxil
>> >
>> > On Thu, Nov 11, 2021 at 6:51 PM Daniel Standish <dp...@gmail.com>
>> wrote:
>> >>
>> >> Yeah I agree with you.
>> >>
>> >> The one other thing I'll mention is the other use case that was raised
>> in an issue was `datetime` which like set is also not json-serializable,
>> but unlike set would probably not be yaml-serializable.
>> >>
>> >> But yeah let's see if others can help establish a consensus.
>> >>
>> >> Small note: another thing sortof in support of your position is that,
>> if you can't override the param from UI and CLI and these other means
>> (because it's not expecting something that can be serialize that way), then
>> you don't even need it to be a `param` but it could just as easily be an
>> operator or task arg instead.  I.e. if  you're staying in python you can
>> use python; but what's special about params is they can be set from
>> outside.  The other side of this though is that probably arbitrary dag
>> params probably _did_ work with trigger dag run operator.
>> >>
>> >>
>> >> On Thu, Nov 11, 2021 at 10:06 AM Jarek Potiuk <ja...@potiuk.com>
>> wrote:
>> >>>
>> >>> > So you would say in 2.2.3 we "break" that again?  Not wait for 3.0
>> because, even if it was perhaps an accident, support was there?
>> >>>
>> >>> Yep. If others agree this is the way to go, I'd be happy to. We had
>> >>> some other changes that worked "accidentally" but were never stated
>> >>> that they work this way. I think it's a pretty good assumption (even
>> >>> if it is implicit) that "params" set for dag triggering are "data" and
>> >>> not "code". It could be python callable of course, but I think it's
>> >>> kinda "abuse" - especially that it excludes triggering via CLI/UI.
>> >>>
>> >>> The thing is that we do not "specify" what is our "stable  API" and
>> >>> what is not also in many other places, there is a certain ambiguity
>> >>> for some of them. Of course it's not only whether it is "specified" or
>> >>> "not", it's also much more "whether a lot of people could interpret
>> >>> and use it in this way". I think (but maybe others can chime in)  - it
>> >>> would be reasonable to assume that using callables or other Python
>> >>> Code is expected when we already have:
>> >>> a) CLI with string/JSON input
>> >>> b) UI with string/JSON input
>> >>> c) ability of using JSON-schema to actually verify the parameters
>> >>> (this last one actually shows a clear intention of having those
>> >>> parameters "data only")..
>> >>>
>> >>> J.
>>
>

Re: [DISCUSS] non-json-serializable params

Posted by Daniel Standish <dp...@gmail.com>.
OK so we have a consensus of 3.  Should I create a voting thread for this?
Namely, formally deprecating non-json-serializable params, for removal in
3.0?

On Thu, Nov 11, 2021 at 2:40 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Indeed - if we want to really "deprecate" (and drop in 3.0) support
> for non-serializable params, then JSON serialization is the way to go.
>
> The only "benefit" of using YAML is "set" support but if we are going
> to "deprecate" non-serializable params, then we can easily include
> "set" in being 'non-serializable" and use JSON. There is a good reason
> why JSON does not support sets (because in serialized form it is
> exactly the same as list - there is no difference, really)
>
> J.
>
> On Thu, Nov 11, 2021 at 9:24 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > -1 for breaking it again. We should go ahead with a deprecation route.
> JSON serializable makes sense, I am not fully convinced if YAML
> serializable is any better !
> >
> > Another note is the current params which Daniel fixed now use
> Serialization from our DAG Serialization -
> https://github.com/apache/airflow/blob/7622f5e08261afe5ab50a08a6ca0804af8c7c7fe/airflow/serialization/serialized_objects.py#L289-L330
> so it currently supports Timedelta, Timezone, Datetime, Tuple etc objects.
> >
> > But I agree with Daniel that we should deprecate and only support JSON
> Serializable objects to make it fully featured like overriding it via CLI,
> API and Webserver.
> >
> > Regards,
> > Kaxil
> >
> > On Thu, Nov 11, 2021 at 6:51 PM Daniel Standish <dp...@gmail.com>
> wrote:
> >>
> >> Yeah I agree with you.
> >>
> >> The one other thing I'll mention is the other use case that was raised
> in an issue was `datetime` which like set is also not json-serializable,
> but unlike set would probably not be yaml-serializable.
> >>
> >> But yeah let's see if others can help establish a consensus.
> >>
> >> Small note: another thing sortof in support of your position is that,
> if you can't override the param from UI and CLI and these other means
> (because it's not expecting something that can be serialize that way), then
> you don't even need it to be a `param` but it could just as easily be an
> operator or task arg instead.  I.e. if  you're staying in python you can
> use python; but what's special about params is they can be set from
> outside.  The other side of this though is that probably arbitrary dag
> params probably _did_ work with trigger dag run operator.
> >>
> >>
> >> On Thu, Nov 11, 2021 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>
> >>> > So you would say in 2.2.3 we "break" that again?  Not wait for 3.0
> because, even if it was perhaps an accident, support was there?
> >>>
> >>> Yep. If others agree this is the way to go, I'd be happy to. We had
> >>> some other changes that worked "accidentally" but were never stated
> >>> that they work this way. I think it's a pretty good assumption (even
> >>> if it is implicit) that "params" set for dag triggering are "data" and
> >>> not "code". It could be python callable of course, but I think it's
> >>> kinda "abuse" - especially that it excludes triggering via CLI/UI.
> >>>
> >>> The thing is that we do not "specify" what is our "stable  API" and
> >>> what is not also in many other places, there is a certain ambiguity
> >>> for some of them. Of course it's not only whether it is "specified" or
> >>> "not", it's also much more "whether a lot of people could interpret
> >>> and use it in this way". I think (but maybe others can chime in)  - it
> >>> would be reasonable to assume that using callables or other Python
> >>> Code is expected when we already have:
> >>> a) CLI with string/JSON input
> >>> b) UI with string/JSON input
> >>> c) ability of using JSON-schema to actually verify the parameters
> >>> (this last one actually shows a clear intention of having those
> >>> parameters "data only")..
> >>>
> >>> J.
>

Re: [DISCUSS] non-json-serializable params

Posted by Jarek Potiuk <ja...@potiuk.com>.
Indeed - if we want to really "deprecate" (and drop in 3.0) support
for non-serializable params, then JSON serialization is the way to go.

The only "benefit" of using YAML is "set" support but if we are going
to "deprecate" non-serializable params, then we can easily include
"set" in being 'non-serializable" and use JSON. There is a good reason
why JSON does not support sets (because in serialized form it is
exactly the same as list - there is no difference, really)

J.

On Thu, Nov 11, 2021 at 9:24 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> -1 for breaking it again. We should go ahead with a deprecation route. JSON serializable makes sense, I am not fully convinced if YAML serializable is any better !
>
> Another note is the current params which Daniel fixed now use Serialization from our DAG Serialization - https://github.com/apache/airflow/blob/7622f5e08261afe5ab50a08a6ca0804af8c7c7fe/airflow/serialization/serialized_objects.py#L289-L330 so it currently supports Timedelta, Timezone, Datetime, Tuple etc objects.
>
> But I agree with Daniel that we should deprecate and only support JSON Serializable objects to make it fully featured like overriding it via CLI, API and Webserver.
>
> Regards,
> Kaxil
>
> On Thu, Nov 11, 2021 at 6:51 PM Daniel Standish <dp...@gmail.com> wrote:
>>
>> Yeah I agree with you.
>>
>> The one other thing I'll mention is the other use case that was raised in an issue was `datetime` which like set is also not json-serializable, but unlike set would probably not be yaml-serializable.
>>
>> But yeah let's see if others can help establish a consensus.
>>
>> Small note: another thing sortof in support of your position is that, if you can't override the param from UI and CLI and these other means (because it's not expecting something that can be serialize that way), then you don't even need it to be a `param` but it could just as easily be an operator or task arg instead.  I.e. if  you're staying in python you can use python; but what's special about params is they can be set from outside.  The other side of this though is that probably arbitrary dag params probably _did_ work with trigger dag run operator.
>>
>>
>> On Thu, Nov 11, 2021 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>> > So you would say in 2.2.3 we "break" that again?  Not wait for 3.0 because, even if it was perhaps an accident, support was there?
>>>
>>> Yep. If others agree this is the way to go, I'd be happy to. We had
>>> some other changes that worked "accidentally" but were never stated
>>> that they work this way. I think it's a pretty good assumption (even
>>> if it is implicit) that "params" set for dag triggering are "data" and
>>> not "code". It could be python callable of course, but I think it's
>>> kinda "abuse" - especially that it excludes triggering via CLI/UI.
>>>
>>> The thing is that we do not "specify" what is our "stable  API" and
>>> what is not also in many other places, there is a certain ambiguity
>>> for some of them. Of course it's not only whether it is "specified" or
>>> "not", it's also much more "whether a lot of people could interpret
>>> and use it in this way". I think (but maybe others can chime in)  - it
>>> would be reasonable to assume that using callables or other Python
>>> Code is expected when we already have:
>>> a) CLI with string/JSON input
>>> b) UI with string/JSON input
>>> c) ability of using JSON-schema to actually verify the parameters
>>> (this last one actually shows a clear intention of having those
>>> parameters "data only")..
>>>
>>> J.

Re: [DISCUSS] non-json-serializable params

Posted by Kaxil Naik <ka...@gmail.com>.
-1 for breaking it again. We should go ahead with a deprecation route. JSON
serializable makes sense, I am not fully convinced if YAML serializable is
any better !

Another note is the current params which Daniel fixed now use Serialization
from our DAG Serialization -
https://github.com/apache/airflow/blob/7622f5e08261afe5ab50a08a6ca0804af8c7c7fe/airflow/serialization/serialized_objects.py#L289-L330
so
it currently supports *Timedelta*, *Timezone*, *Datetime*, *Tuple* etc
objects.

But I agree with Daniel that we should deprecate and only support JSON
Serializable objects to make it fully featured like overriding it via CLI,
API and Webserver.

Regards,
Kaxil

On Thu, Nov 11, 2021 at 6:51 PM Daniel Standish <dp...@gmail.com>
wrote:

> Yeah I agree with you.
>
> The one other thing I'll mention is the other use case that was raised in
> an issue was `datetime` which like set is also not json-serializable, but
> unlike set would probably not be yaml-serializable.
>
> But yeah let's see if others can help establish a consensus.
>
> Small note: another thing sortof in support of your position is that, if
> you can't override the param from UI and CLI and these other means (because
> it's not expecting something that can be serialize that way), then you
> don't even need it to be a `param` but it could just as easily be an
> operator or task arg instead.  I.e. if  you're staying in python you can
> use python; but what's special about params is they can be set from
> outside.  The other side of this though is that probably arbitrary dag
> params probably _did_ work with trigger dag run operator.
>
>
> On Thu, Nov 11, 2021 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> > So you would say in 2.2.3 we "break" that again?  Not wait for 3.0
>> because, even if it was perhaps an accident, support was there?
>>
>> Yep. If others agree this is the way to go, I'd be happy to. We had
>> some other changes that worked "accidentally" but were never stated
>> that they work this way. I think it's a pretty good assumption (even
>> if it is implicit) that "params" set for dag triggering are "data" and
>> not "code". It could be python callable of course, but I think it's
>> kinda "abuse" - especially that it excludes triggering via CLI/UI.
>>
>> The thing is that we do not "specify" what is our "stable  API" and
>> what is not also in many other places, there is a certain ambiguity
>> for some of them. Of course it's not only whether it is "specified" or
>> "not", it's also much more "whether a lot of people could interpret
>> and use it in this way". I think (but maybe others can chime in)  - it
>> would be reasonable to assume that using callables or other Python
>> Code is expected when we already have:
>> a) CLI with string/JSON input
>> b) UI with string/JSON input
>> c) ability of using JSON-schema to actually verify the parameters
>> (this last one actually shows a clear intention of having those
>> parameters "data only")..
>>
>> J.
>>
>

Re: [DISCUSS] non-json-serializable params

Posted by Daniel Standish <dp...@gmail.com>.
Yeah I agree with you.

The one other thing I'll mention is the other use case that was raised in
an issue was `datetime` which like set is also not json-serializable, but
unlike set would probably not be yaml-serializable.

But yeah let's see if others can help establish a consensus.

Small note: another thing sortof in support of your position is that, if
you can't override the param from UI and CLI and these other means (because
it's not expecting something that can be serialize that way), then you
don't even need it to be a `param` but it could just as easily be an
operator or task arg instead.  I.e. if  you're staying in python you can
use python; but what's special about params is they can be set from
outside.  The other side of this though is that probably arbitrary dag
params probably _did_ work with trigger dag run operator.


On Thu, Nov 11, 2021 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> > So you would say in 2.2.3 we "break" that again?  Not wait for 3.0
> because, even if it was perhaps an accident, support was there?
>
> Yep. If others agree this is the way to go, I'd be happy to. We had
> some other changes that worked "accidentally" but were never stated
> that they work this way. I think it's a pretty good assumption (even
> if it is implicit) that "params" set for dag triggering are "data" and
> not "code". It could be python callable of course, but I think it's
> kinda "abuse" - especially that it excludes triggering via CLI/UI.
>
> The thing is that we do not "specify" what is our "stable  API" and
> what is not also in many other places, there is a certain ambiguity
> for some of them. Of course it's not only whether it is "specified" or
> "not", it's also much more "whether a lot of people could interpret
> and use it in this way". I think (but maybe others can chime in)  - it
> would be reasonable to assume that using callables or other Python
> Code is expected when we already have:
> a) CLI with string/JSON input
> b) UI with string/JSON input
> c) ability of using JSON-schema to actually verify the parameters
> (this last one actually shows a clear intention of having those
> parameters "data only")..
>
> J.
>

Re: [DISCUSS] non-json-serializable params

Posted by Jarek Potiuk <ja...@potiuk.com>.
> So you would say in 2.2.3 we "break" that again?  Not wait for 3.0 because, even if it was perhaps an accident, support was there?

Yep. If others agree this is the way to go, I'd be happy to. We had
some other changes that worked "accidentally" but were never stated
that they work this way. I think it's a pretty good assumption (even
if it is implicit) that "params" set for dag triggering are "data" and
not "code". It could be python callable of course, but I think it's
kinda "abuse" - especially that it excludes triggering via CLI/UI.

The thing is that we do not "specify" what is our "stable  API" and
what is not also in many other places, there is a certain ambiguity
for some of them. Of course it's not only whether it is "specified" or
"not", it's also much more "whether a lot of people could interpret
and use it in this way". I think (but maybe others can chime in)  - it
would be reasonable to assume that using callables or other Python
Code is expected when we already have:
a) CLI with string/JSON input
b) UI with string/JSON input
c) ability of using JSON-schema to actually verify the parameters
(this last one actually shows a clear intention of having those
parameters "data only")..

J.

Re: [DISCUSS] non-json-serializable params

Posted by Daniel Standish <dp...@gmail.com>.
Yeah 100% I have nothing against yaml-serializable and it sounds like a
good idea.  But my main issue is arbitrary python.

So you would say in 2.2.3 we "break" that again?  Not wait for 3.0 because,
even if it was perhaps an accident, support was there?

On Thu, Nov 11, 2021, 12:14 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> I can't really imagine a good way of supporting "dynamic python" via
> external parameters (pickling AAAAAAARGH!). So still I think your idea
> of deprecating non-serializable parameters is good.
>
> True - YAML serialization does not support "all-python", but it does
> support "all data structures".  But there is one difference vs
> "json-serializable".
>
> But in this case "non YAML serializable" as opposed to "non-json
> serializable" check we could introduce it very fast - even at patch
> level. We do not have to wait for Airflow 3 IMHO. We already know that
> people are using "sets" and this basically means that we have to go
> the "deprecation" route. On the other hand allowing arbitrary dynamic
> python parameters and not data structures should be considered as
> "accidental" (it was never an intention) and not really part of the
> public API, so we can simply "fix it" in 2.2.3 by requiring the
> parameters to be YAML-serializable.
>
> J.
>
>
>
> On Wed, Nov 10, 2021 at 11:11 PM Daniel Standish <dp...@gmail.com>
> wrote:
> >
> > OK but if you extend to yaml, you don't resolve the problem but only
> move the goalpost a little bit.  Because then the question remains; enforce
> yaml-serializable or arbitrary python (which is the soon-to-be status quo
> after 2.2.2 is released).
> >
> > On Wed, Nov 10, 2021 at 1:26 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >>
> >> Very good question.
> >>
> >> Maybe we can solve it differently. We could recommend JSON and the
> >> basic datasets that JSON supports, but it should be possible for
> >> Airflow to support YAML as input format for params.
> >>
> >> Every JSON is also valid YAML and you can use Yaml parser to parse
> >> JSON. Yaml supports sets (though the syntax is cumbersome)
> >> https://yaml.org/type/set.html . So if we support YAML for both API
> >> and UI, if you REALLY want to use set - you will be able to .
> >>
> >> My proposal is that all the examples etc. could show JSON and we
> >> should support it, but there should be an asterisk (*) that if you
> >> want advanced features like YAML, you could use YAML. Then we would
> >> not have to deal with deprecation which is problematic as we could
> >> only remove this feature in Airflow 3.
> >>
> >> J.
> >>
> >> On Wed, Nov 10, 2021 at 7:14 PM Daniel Standish <dp...@gmail.com>
> wrote:
> >> >
> >> > Prior to 2.2.0, you could use non-json-serializable params in a dag.
> Here's an example with `set`:
> >> >
> >> > @dag.task(params={"a": {1, 2, 3}, "b": [3, 4, 5]})
> >> > def set_param(intersection):
> >> >     print(intersection)
> >> >
> >> >
> >> > set_param("{{ params.a.intersection(params.b).pop() }}")
> >> >
> >> > In 2.2.0 this was broken, and in 2.2.2rc1 it should be fixed.
> >> >
> >> > There's a problem though.  An important part of params is the ability
> to override them when triggering a dag from the web UI or CLI.  But params
> supplied through either mechanism should be json-serializable.
> >> >
> >> > So on one hand we allow arbitrary params, and on the other hand we
> require them to be json serializable.  We can see how this causes a problem
> in the above example: if you provide `a` as a `list` in the UI, it won't
> have the method `intersection`.
> >> >
> >> > So the question: should we deprecate non-json-serializable params?
> >> >
> >> > My feeling is yes.  But I'm not sure how widely this flexibility is
> used in the wild.
> >> >
>

Re: [DISCUSS] non-json-serializable params

Posted by Jarek Potiuk <ja...@potiuk.com>.
I can't really imagine a good way of supporting "dynamic python" via
external parameters (pickling AAAAAAARGH!). So still I think your idea
of deprecating non-serializable parameters is good.

True - YAML serialization does not support "all-python", but it does
support "all data structures".  But there is one difference vs
"json-serializable".

But in this case "non YAML serializable" as opposed to "non-json
serializable" check we could introduce it very fast - even at patch
level. We do not have to wait for Airflow 3 IMHO. We already know that
people are using "sets" and this basically means that we have to go
the "deprecation" route. On the other hand allowing arbitrary dynamic
python parameters and not data structures should be considered as
"accidental" (it was never an intention) and not really part of the
public API, so we can simply "fix it" in 2.2.3 by requiring the
parameters to be YAML-serializable.

J.



On Wed, Nov 10, 2021 at 11:11 PM Daniel Standish <dp...@gmail.com> wrote:
>
> OK but if you extend to yaml, you don't resolve the problem but only move the goalpost a little bit.  Because then the question remains; enforce yaml-serializable or arbitrary python (which is the soon-to-be status quo after 2.2.2 is released).
>
> On Wed, Nov 10, 2021 at 1:26 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>> Very good question.
>>
>> Maybe we can solve it differently. We could recommend JSON and the
>> basic datasets that JSON supports, but it should be possible for
>> Airflow to support YAML as input format for params.
>>
>> Every JSON is also valid YAML and you can use Yaml parser to parse
>> JSON. Yaml supports sets (though the syntax is cumbersome)
>> https://yaml.org/type/set.html . So if we support YAML for both API
>> and UI, if you REALLY want to use set - you will be able to .
>>
>> My proposal is that all the examples etc. could show JSON and we
>> should support it, but there should be an asterisk (*) that if you
>> want advanced features like YAML, you could use YAML. Then we would
>> not have to deal with deprecation which is problematic as we could
>> only remove this feature in Airflow 3.
>>
>> J.
>>
>> On Wed, Nov 10, 2021 at 7:14 PM Daniel Standish <dp...@gmail.com> wrote:
>> >
>> > Prior to 2.2.0, you could use non-json-serializable params in a dag. Here's an example with `set`:
>> >
>> > @dag.task(params={"a": {1, 2, 3}, "b": [3, 4, 5]})
>> > def set_param(intersection):
>> >     print(intersection)
>> >
>> >
>> > set_param("{{ params.a.intersection(params.b).pop() }}")
>> >
>> > In 2.2.0 this was broken, and in 2.2.2rc1 it should be fixed.
>> >
>> > There's a problem though.  An important part of params is the ability to override them when triggering a dag from the web UI or CLI.  But params supplied through either mechanism should be json-serializable.
>> >
>> > So on one hand we allow arbitrary params, and on the other hand we require them to be json serializable.  We can see how this causes a problem in the above example: if you provide `a` as a `list` in the UI, it won't have the method `intersection`.
>> >
>> > So the question: should we deprecate non-json-serializable params?
>> >
>> > My feeling is yes.  But I'm not sure how widely this flexibility is used in the wild.
>> >

Re: [DISCUSS] non-json-serializable params

Posted by Daniel Standish <dp...@gmail.com>.
OK but if you extend to yaml, you don't resolve the problem but only move
the goalpost a little bit.  Because then the question remains; enforce
yaml-serializable or arbitrary python (which is the soon-to-be status quo
after 2.2.2 is released).

On Wed, Nov 10, 2021 at 1:26 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Very good question.
>
> Maybe we can solve it differently. We could recommend JSON and the
> basic datasets that JSON supports, but it should be possible for
> Airflow to support YAML as input format for params.
>
> Every JSON is also valid YAML and you can use Yaml parser to parse
> JSON. Yaml supports sets (though the syntax is cumbersome)
> https://yaml.org/type/set.html . So if we support YAML for both API
> and UI, if you REALLY want to use set - you will be able to .
>
> My proposal is that all the examples etc. could show JSON and we
> should support it, but there should be an asterisk (*) that if you
> want advanced features like YAML, you could use YAML. Then we would
> not have to deal with deprecation which is problematic as we could
> only remove this feature in Airflow 3.
>
> J.
>
> On Wed, Nov 10, 2021 at 7:14 PM Daniel Standish <dp...@gmail.com>
> wrote:
> >
> > Prior to 2.2.0, you could use non-json-serializable params in a dag.
> Here's an example with `set`:
> >
> > @dag.task(params={"a": {1, 2, 3}, "b": [3, 4, 5]})
> > def set_param(intersection):
> >     print(intersection)
> >
> >
> > set_param("{{ params.a.intersection(params.b).pop() }}")
> >
> > In 2.2.0 this was broken, and in 2.2.2rc1 it should be fixed.
> >
> > There's a problem though.  An important part of params is the ability to
> override them when triggering a dag from the web UI or CLI.  But params
> supplied through either mechanism should be json-serializable.
> >
> > So on one hand we allow arbitrary params, and on the other hand we
> require them to be json serializable.  We can see how this causes a problem
> in the above example: if you provide `a` as a `list` in the UI, it won't
> have the method `intersection`.
> >
> > So the question: should we deprecate non-json-serializable params?
> >
> > My feeling is yes.  But I'm not sure how widely this flexibility is used
> in the wild.
> >
>

Re: [DISCUSS] non-json-serializable params

Posted by Jarek Potiuk <ja...@potiuk.com>.
Very good question.

Maybe we can solve it differently. We could recommend JSON and the
basic datasets that JSON supports, but it should be possible for
Airflow to support YAML as input format for params.

Every JSON is also valid YAML and you can use Yaml parser to parse
JSON. Yaml supports sets (though the syntax is cumbersome)
https://yaml.org/type/set.html . So if we support YAML for both API
and UI, if you REALLY want to use set - you will be able to .

My proposal is that all the examples etc. could show JSON and we
should support it, but there should be an asterisk (*) that if you
want advanced features like YAML, you could use YAML. Then we would
not have to deal with deprecation which is problematic as we could
only remove this feature in Airflow 3.

J.

On Wed, Nov 10, 2021 at 7:14 PM Daniel Standish <dp...@gmail.com> wrote:
>
> Prior to 2.2.0, you could use non-json-serializable params in a dag. Here's an example with `set`:
>
> @dag.task(params={"a": {1, 2, 3}, "b": [3, 4, 5]})
> def set_param(intersection):
>     print(intersection)
>
>
> set_param("{{ params.a.intersection(params.b).pop() }}")
>
> In 2.2.0 this was broken, and in 2.2.2rc1 it should be fixed.
>
> There's a problem though.  An important part of params is the ability to override them when triggering a dag from the web UI or CLI.  But params supplied through either mechanism should be json-serializable.
>
> So on one hand we allow arbitrary params, and on the other hand we require them to be json serializable.  We can see how this causes a problem in the above example: if you provide `a` as a `list` in the UI, it won't have the method `intersection`.
>
> So the question: should we deprecate non-json-serializable params?
>
> My feeling is yes.  But I'm not sure how widely this flexibility is used in the wild.
>