You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Daniel Standish <da...@astronomer.io.INVALID> on 2022/02/25 07:23:20 UTC

[LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

It's generally assumed that the `extra` field in airflow's Connection model
is JSON string.  However, it's not, strictly speaking, *required* to be so.

I believe we should require it to be JSON.

But I also think we should nudge this a tiny bit further.  A python string
value such as '"hi"' contains a valid json string "hi".  And similarly the
string '[0,2,3]' is _also_ a valid string.  But this is not at all what is
intended for `extra` and, I think for pretty obvious reasons, a bad idea.
So I think we should _also_ require that the value for `extra`, if
provided, must be json that parses as a python _dict_.

So, to summarize, the proposal is, from release 3.0, require that conn
`extra` be json (or None) and require that the json (if provided) must
parse as a dict.

PR to implement deprecation as prescribed by the proposal is here
<https://github.com/apache/airflow/pull/21816>.

This vote will run until Tuesday at 8am UTC (three full weekdays).

Thanks for your consideration.

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

Posted by Jarek Potiuk <ja...@potiuk.com>.
Mainly remove FAB components from hook definition for custom fields. They
should be all declarative but this needs some UI wizardry to rewrite the
"Connections" screen (and some back-compatibility/deprecations).

J


On Wed, Mar 9, 2022 at 6:11 AM Daniel Standish
<da...@astronomer.io.invalid> wrote:

> For the record, the vote [lazily] passed, and the PR
> <https://github.com/apache/airflow/pull/21816> was merged.
>
> As Ash has noted, this will in effect make conn.extra and
> conn.extra_dejson redundant (since, by requiring extra to be json, there's
> no reason we can't store it as a dict in the first place).  But we have to
> think it through, particularly regarding backcompat strategy, and for now
> this is left unresolved.  And I agree with Jarek that we should improve
> certain aspects of the custom fields. I can think of two issues: empty
> string as default value and ugly keys (e.g `extra__google__keyfile_dict`)
> -- would be good to hear what other areas of improvement you have in mind
> Jarek.
>

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
For the record, the vote [lazily] passed, and the PR
<https://github.com/apache/airflow/pull/21816> was merged.

As Ash has noted, this will in effect make conn.extra and conn.extra_dejson
redundant (since, by requiring extra to be json, there's no reason we can't
store it as a dict in the first place).  But we have to think it through,
particularly regarding backcompat strategy, and for now this is left
unresolved.  And I agree with Jarek that we should improve certain aspects
of the custom fields. I can think of two issues: empty string as default
value and ugly keys (e.g `extra__google__keyfile_dict`) -- would be good to
hear what other areas of improvement you have in mind Jarek.

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

Posted by Jarek Potiuk <ja...@potiuk.com>.
+1. And I would also connect this with quite a revamp of connection
behaviour (especially w/regards to UI and custom fields).

Those are very connected and the way it is done now is really, really
terrible (I did backwards-compatible implementation of some of it when we
split into providers and I can tell some horror stories on how bad
it turned out). This ghost of the past should be killed in Airflow 3.

On Sun, Feb 27, 2022 at 6:40 PM Brent Bovenzi <br...@astronomer.io.invalid>
wrote:

> +1
>
> On Sat, Feb 26, 2022 at 11:58 AM Tomasz Urbaszek <tu...@gmail.com>
> wrote:
>
>> > > Should we provide some way of validating existing connections so
>> users can check this before upgrading to 3.0?
>> > Sounds like a good idea
>>
>> I think we may consider refreshing the airflow upgrade scripts
>>
>> On Fri, 25 Feb 2022 at 22:04, Daniel Standish
>> <da...@astronomer.io.invalid> wrote:
>>
>>> > What does this mean for the extra and extra_dejson attrs that exist on
>>> Connection right now?
>>>
>>> It's a good question.  I think ideally we should deprecate extra_dejson,
>>> and `extra` should be dict (require json-encodable), with a db type of JSON
>>> where supported.  But the path to get there seems a little murkier.
>>> Deprecating extra_dejson would be straightforward.  But how would you
>>> deprecate extra as string, while enabling folks to migrate to extra as dict
>>> in place...  Maybe make it a `str` subclass that implements `get`?
>>>
>>> Something like this:
>>>
>>> class ExtraDict(str):
>>>
>>>     def __getitem__(self, item):
>>>         return json.loads(self)[item]
>>>
>>>     def get(self, item, default=None):
>>>         return json.loads(self).get(item, default)
>>>
>>>
>>> So then it behaves mostly like a string, but can also be accessed like a
>>> dict?
>>>
>>> What do you think?
>>>
>>> > Should we provide some way of validating existing connections so users
>>> can check this before upgrading to 3.0?
>>>
>>> Sounds like a good idea.
>>>
>>>
>>>

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

Posted by Brent Bovenzi <br...@astronomer.io.INVALID>.
+1

On Sat, Feb 26, 2022 at 11:58 AM Tomasz Urbaszek <tu...@gmail.com>
wrote:

> > > Should we provide some way of validating existing connections so users
> can check this before upgrading to 3.0?
> > Sounds like a good idea
>
> I think we may consider refreshing the airflow upgrade scripts
>
> On Fri, 25 Feb 2022 at 22:04, Daniel Standish
> <da...@astronomer.io.invalid> wrote:
>
>> > What does this mean for the extra and extra_dejson attrs that exist on
>> Connection right now?
>>
>> It's a good question.  I think ideally we should deprecate extra_dejson,
>> and `extra` should be dict (require json-encodable), with a db type of JSON
>> where supported.  But the path to get there seems a little murkier.
>> Deprecating extra_dejson would be straightforward.  But how would you
>> deprecate extra as string, while enabling folks to migrate to extra as dict
>> in place...  Maybe make it a `str` subclass that implements `get`?
>>
>> Something like this:
>>
>> class ExtraDict(str):
>>
>>     def __getitem__(self, item):
>>         return json.loads(self)[item]
>>
>>     def get(self, item, default=None):
>>         return json.loads(self).get(item, default)
>>
>>
>> So then it behaves mostly like a string, but can also be accessed like a
>> dict?
>>
>> What do you think?
>>
>> > Should we provide some way of validating existing connections so users
>> can check this before upgrading to 3.0?
>>
>> Sounds like a good idea.
>>
>>
>>

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

Posted by Tomasz Urbaszek <tu...@gmail.com>.
> > Should we provide some way of validating existing connections so users
can check this before upgrading to 3.0?
> Sounds like a good idea

I think we may consider refreshing the airflow upgrade scripts

On Fri, 25 Feb 2022 at 22:04, Daniel Standish
<da...@astronomer.io.invalid> wrote:

> > What does this mean for the extra and extra_dejson attrs that exist on
> Connection right now?
>
> It's a good question.  I think ideally we should deprecate extra_dejson,
> and `extra` should be dict (require json-encodable), with a db type of JSON
> where supported.  But the path to get there seems a little murkier.
> Deprecating extra_dejson would be straightforward.  But how would you
> deprecate extra as string, while enabling folks to migrate to extra as dict
> in place...  Maybe make it a `str` subclass that implements `get`?
>
> Something like this:
>
> class ExtraDict(str):
>
>     def __getitem__(self, item):
>         return json.loads(self)[item]
>
>     def get(self, item, default=None):
>         return json.loads(self).get(item, default)
>
>
> So then it behaves mostly like a string, but can also be accessed like a
> dict?
>
> What do you think?
>
> > Should we provide some way of validating existing connections so users
> can check this before upgrading to 3.0?
>
> Sounds like a good idea.
>
>
>

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
> What does this mean for the extra and extra_dejson attrs that exist on
Connection right now?

It's a good question.  I think ideally we should deprecate extra_dejson,
and `extra` should be dict (require json-encodable), with a db type of JSON
where supported.  But the path to get there seems a little murkier.
Deprecating extra_dejson would be straightforward.  But how would you
deprecate extra as string, while enabling folks to migrate to extra as dict
in place...  Maybe make it a `str` subclass that implements `get`?

Something like this:

class ExtraDict(str):

    def __getitem__(self, item):
        return json.loads(self)[item]

    def get(self, item, default=None):
        return json.loads(self).get(item, default)


So then it behaves mostly like a string, but can also be accessed like a
dict?

What do you think?

> Should we provide some way of validating existing connections so users
can check this before upgrading to 3.0?

Sounds like a good idea.

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

Posted by Josh Fell <jo...@astronomer.io.INVALID>.
+1

Extra is required to be a JSON when connections are via in the UI now (a
big hairy error message is displayed if it isn't) so hopefully that reduces
blast radius for existing connections a bit.

On Fri, Feb 25, 2022 at 7:41 AM Kaxil Naik <ka...@gmail.com> wrote:

> +1
>
> On Fri, 25 Feb 2022 at 12:35, Tomasz Urbaszek <tu...@gmail.com> wrote:
>
>> +1 for this idea.
>>
>> Should we provide some way of validating existing connections so users
>> can check this before upgrading to 3.0?
>>
>> Thanks,
>> Tomek
>>
>> On Fri, 25 Feb 2022 at 10:14, Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>>> I'm all for this.
>>>
>>> What does this mean for the extra and extra_dejson attrs that exist on
>>> Connection right now?
>>>
>>> -a
>>>
>>> On Thu, Feb 24 2022 at 23:23:20 -0800, Daniel Standish
>>> <da...@astronomer.io.INVALID> wrote:
>>>
>>> It's generally assumed that the `extra` field in airflow's Connection
>>> model is JSON string.  However, it's not, strictly speaking, *required*
>>> to be so.
>>>
>>> I believe we should require it to be JSON.
>>>
>>> But I also think we should nudge this a tiny bit further.  A python
>>> string value such as '"hi"' contains a valid json string "hi".  And
>>> similarly the string '[0,2,3]' is _also_ a valid string.  But this is not
>>> at all what is intended for `extra` and, I think for pretty obvious
>>> reasons, a bad idea.  So I think we should _also_ require that the value
>>> for `extra`, if provided, must be json that parses as a python _dict_.
>>>
>>> So, to summarize, the proposal is, from release 3.0, require that conn
>>> `extra` be json (or None) and require that the json (if provided) must
>>> parse as a dict.
>>>
>>> PR to implement deprecation as prescribed by the proposal is here
>>> <https://github.com/apache/airflow/pull/21816>.
>>>
>>> This vote will run until Tuesday at 8am UTC (three full weekdays).
>>>
>>> Thanks for your consideration.
>>>
>>>

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

Posted by Kaxil Naik <ka...@gmail.com>.
+1

On Fri, 25 Feb 2022 at 12:35, Tomasz Urbaszek <tu...@gmail.com> wrote:

> +1 for this idea.
>
> Should we provide some way of validating existing connections so users can
> check this before upgrading to 3.0?
>
> Thanks,
> Tomek
>
> On Fri, 25 Feb 2022 at 10:14, Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> I'm all for this.
>>
>> What does this mean for the extra and extra_dejson attrs that exist on
>> Connection right now?
>>
>> -a
>>
>> On Thu, Feb 24 2022 at 23:23:20 -0800, Daniel Standish
>> <da...@astronomer.io.INVALID> wrote:
>>
>> It's generally assumed that the `extra` field in airflow's Connection
>> model is JSON string.  However, it's not, strictly speaking, *required*
>> to be so.
>>
>> I believe we should require it to be JSON.
>>
>> But I also think we should nudge this a tiny bit further.  A python
>> string value such as '"hi"' contains a valid json string "hi".  And
>> similarly the string '[0,2,3]' is _also_ a valid string.  But this is not
>> at all what is intended for `extra` and, I think for pretty obvious
>> reasons, a bad idea.  So I think we should _also_ require that the value
>> for `extra`, if provided, must be json that parses as a python _dict_.
>>
>> So, to summarize, the proposal is, from release 3.0, require that conn
>> `extra` be json (or None) and require that the json (if provided) must
>> parse as a dict.
>>
>> PR to implement deprecation as prescribed by the proposal is here
>> <https://github.com/apache/airflow/pull/21816>.
>>
>> This vote will run until Tuesday at 8am UTC (three full weekdays).
>>
>> Thanks for your consideration.
>>
>>

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

Posted by Tomasz Urbaszek <tu...@gmail.com>.
+1 for this idea.

Should we provide some way of validating existing connections so users can
check this before upgrading to 3.0?

Thanks,
Tomek

On Fri, 25 Feb 2022 at 10:14, Ash Berlin-Taylor <as...@apache.org> wrote:

> I'm all for this.
>
> What does this mean for the extra and extra_dejson attrs that exist on
> Connection right now?
>
> -a
>
> On Thu, Feb 24 2022 at 23:23:20 -0800, Daniel Standish
> <da...@astronomer.io.INVALID> wrote:
>
> It's generally assumed that the `extra` field in airflow's Connection
> model is JSON string.  However, it's not, strictly speaking, *required*
> to be so.
>
> I believe we should require it to be JSON.
>
> But I also think we should nudge this a tiny bit further.  A python string
> value such as '"hi"' contains a valid json string "hi".  And similarly the
> string '[0,2,3]' is _also_ a valid string.  But this is not at all what is
> intended for `extra` and, I think for pretty obvious reasons, a bad idea.
> So I think we should _also_ require that the value for `extra`, if
> provided, must be json that parses as a python _dict_.
>
> So, to summarize, the proposal is, from release 3.0, require that conn
> `extra` be json (or None) and require that the json (if provided) must
> parse as a dict.
>
> PR to implement deprecation as prescribed by the proposal is here
> <https://github.com/apache/airflow/pull/21816>.
>
> This vote will run until Tuesday at 8am UTC (three full weekdays).
>
> Thanks for your consideration.
>
>

Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict

Posted by Ash Berlin-Taylor <as...@apache.org>.
I'm all for this.

What does this mean for the extra and extra_dejson attrs that exist on 
Connection right now?

-a

On Thu, Feb 24 2022 at 23:23:20 -0800, Daniel Standish 
<da...@astronomer.io.INVALID> wrote:
> It's generally assumed that the `extra` field in airflow's Connection 
> model is JSON string.  However, it's not, strictly speaking, 
> /required/ to be so.
> 
> I believe we should require it to be JSON.
> 
> But I also think we should nudge this a tiny bit further.  A python 
> string value such as '"hi"' contains a valid json string "hi".  And 
> similarly the string '[0,2,3]' is _also_ a valid string.  But this is 
> not at all what is intended for `extra` and, I think for pretty 
> obvious reasons, a bad idea.  So I think we should _also_ require 
> that the value for `extra`, if provided, must be json that parses as 
> a python _dict_.
> 
> So, to summarize, the proposal is, from release 3.0, require that 
> conn `extra` be json (or None) and require that the json (if 
> provided) must parse as a dict.
> 
> PR to implement deprecation as prescribed by the proposal is here 
> <https://github.com/apache/airflow/pull/21816>.
> 
> This vote will run until Tuesday at 8am UTC (three full weekdays).
> 
> Thanks for your consideration.