You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Daniel Standish <da...@astronomer.io.INVALID> on 2022/02/25 07:23:20 UTC
[LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
It's generally assumed that the `extra` field in airflow's Connection model
is JSON string. However, it's not, strictly speaking, *required* to be so.
I believe we should require it to be JSON.
But I also think we should nudge this a tiny bit further. A python string
value such as '"hi"' contains a valid json string "hi". And similarly the
string '[0,2,3]' is _also_ a valid string. But this is not at all what is
intended for `extra` and, I think for pretty obvious reasons, a bad idea.
So I think we should _also_ require that the value for `extra`, if
provided, must be json that parses as a python _dict_.
So, to summarize, the proposal is, from release 3.0, require that conn
`extra` be json (or None) and require that the json (if provided) must
parse as a dict.
PR to implement deprecation as prescribed by the proposal is here
<https://github.com/apache/airflow/pull/21816>.
This vote will run until Tuesday at 8am UTC (three full weekdays).
Thanks for your consideration.
Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
Posted by Jarek Potiuk <ja...@potiuk.com>.
Mainly remove FAB components from hook definition for custom fields. They
should be all declarative but this needs some UI wizardry to rewrite the
"Connections" screen (and some back-compatibility/deprecations).
J
On Wed, Mar 9, 2022 at 6:11 AM Daniel Standish
<da...@astronomer.io.invalid> wrote:
> For the record, the vote [lazily] passed, and the PR
> <https://github.com/apache/airflow/pull/21816> was merged.
>
> As Ash has noted, this will in effect make conn.extra and
> conn.extra_dejson redundant (since, by requiring extra to be json, there's
> no reason we can't store it as a dict in the first place). But we have to
> think it through, particularly regarding backcompat strategy, and for now
> this is left unresolved. And I agree with Jarek that we should improve
> certain aspects of the custom fields. I can think of two issues: empty
> string as default value and ugly keys (e.g `extra__google__keyfile_dict`)
> -- would be good to hear what other areas of improvement you have in mind
> Jarek.
>
Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
Posted by Daniel Standish <da...@astronomer.io.INVALID>.
For the record, the vote [lazily] passed, and the PR
<https://github.com/apache/airflow/pull/21816> was merged.
As Ash has noted, this will in effect make conn.extra and conn.extra_dejson
redundant (since, by requiring extra to be json, there's no reason we can't
store it as a dict in the first place). But we have to think it through,
particularly regarding backcompat strategy, and for now this is left
unresolved. And I agree with Jarek that we should improve certain aspects
of the custom fields. I can think of two issues: empty string as default
value and ugly keys (e.g `extra__google__keyfile_dict`) -- would be good to
hear what other areas of improvement you have in mind Jarek.
Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
Posted by Jarek Potiuk <ja...@potiuk.com>.
+1. And I would also connect this with quite a revamp of connection
behaviour (especially w/regards to UI and custom fields).
Those are very connected and the way it is done now is really, really
terrible (I did backwards-compatible implementation of some of it when we
split into providers and I can tell some horror stories on how bad
it turned out). This ghost of the past should be killed in Airflow 3.
On Sun, Feb 27, 2022 at 6:40 PM Brent Bovenzi <br...@astronomer.io.invalid>
wrote:
> +1
>
> On Sat, Feb 26, 2022 at 11:58 AM Tomasz Urbaszek <tu...@gmail.com>
> wrote:
>
>> > > Should we provide some way of validating existing connections so
>> users can check this before upgrading to 3.0?
>> > Sounds like a good idea
>>
>> I think we may consider refreshing the airflow upgrade scripts
>>
>> On Fri, 25 Feb 2022 at 22:04, Daniel Standish
>> <da...@astronomer.io.invalid> wrote:
>>
>>> > What does this mean for the extra and extra_dejson attrs that exist on
>>> Connection right now?
>>>
>>> It's a good question. I think ideally we should deprecate extra_dejson,
>>> and `extra` should be dict (require json-encodable), with a db type of JSON
>>> where supported. But the path to get there seems a little murkier.
>>> Deprecating extra_dejson would be straightforward. But how would you
>>> deprecate extra as string, while enabling folks to migrate to extra as dict
>>> in place... Maybe make it a `str` subclass that implements `get`?
>>>
>>> Something like this:
>>>
>>> class ExtraDict(str):
>>>
>>> def __getitem__(self, item):
>>> return json.loads(self)[item]
>>>
>>> def get(self, item, default=None):
>>> return json.loads(self).get(item, default)
>>>
>>>
>>> So then it behaves mostly like a string, but can also be accessed like a
>>> dict?
>>>
>>> What do you think?
>>>
>>> > Should we provide some way of validating existing connections so users
>>> can check this before upgrading to 3.0?
>>>
>>> Sounds like a good idea.
>>>
>>>
>>>
Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
Posted by Brent Bovenzi <br...@astronomer.io.INVALID>.
+1
On Sat, Feb 26, 2022 at 11:58 AM Tomasz Urbaszek <tu...@gmail.com>
wrote:
> > > Should we provide some way of validating existing connections so users
> can check this before upgrading to 3.0?
> > Sounds like a good idea
>
> I think we may consider refreshing the airflow upgrade scripts
>
> On Fri, 25 Feb 2022 at 22:04, Daniel Standish
> <da...@astronomer.io.invalid> wrote:
>
>> > What does this mean for the extra and extra_dejson attrs that exist on
>> Connection right now?
>>
>> It's a good question. I think ideally we should deprecate extra_dejson,
>> and `extra` should be dict (require json-encodable), with a db type of JSON
>> where supported. But the path to get there seems a little murkier.
>> Deprecating extra_dejson would be straightforward. But how would you
>> deprecate extra as string, while enabling folks to migrate to extra as dict
>> in place... Maybe make it a `str` subclass that implements `get`?
>>
>> Something like this:
>>
>> class ExtraDict(str):
>>
>> def __getitem__(self, item):
>> return json.loads(self)[item]
>>
>> def get(self, item, default=None):
>> return json.loads(self).get(item, default)
>>
>>
>> So then it behaves mostly like a string, but can also be accessed like a
>> dict?
>>
>> What do you think?
>>
>> > Should we provide some way of validating existing connections so users
>> can check this before upgrading to 3.0?
>>
>> Sounds like a good idea.
>>
>>
>>
Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
Posted by Tomasz Urbaszek <tu...@gmail.com>.
> > Should we provide some way of validating existing connections so users
can check this before upgrading to 3.0?
> Sounds like a good idea
I think we may consider refreshing the airflow upgrade scripts
On Fri, 25 Feb 2022 at 22:04, Daniel Standish
<da...@astronomer.io.invalid> wrote:
> > What does this mean for the extra and extra_dejson attrs that exist on
> Connection right now?
>
> It's a good question. I think ideally we should deprecate extra_dejson,
> and `extra` should be dict (require json-encodable), with a db type of JSON
> where supported. But the path to get there seems a little murkier.
> Deprecating extra_dejson would be straightforward. But how would you
> deprecate extra as string, while enabling folks to migrate to extra as dict
> in place... Maybe make it a `str` subclass that implements `get`?
>
> Something like this:
>
> class ExtraDict(str):
>
> def __getitem__(self, item):
> return json.loads(self)[item]
>
> def get(self, item, default=None):
> return json.loads(self).get(item, default)
>
>
> So then it behaves mostly like a string, but can also be accessed like a
> dict?
>
> What do you think?
>
> > Should we provide some way of validating existing connections so users
> can check this before upgrading to 3.0?
>
> Sounds like a good idea.
>
>
>
Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
Posted by Daniel Standish <da...@astronomer.io.INVALID>.
> What does this mean for the extra and extra_dejson attrs that exist on
Connection right now?
It's a good question. I think ideally we should deprecate extra_dejson,
and `extra` should be dict (require json-encodable), with a db type of JSON
where supported. But the path to get there seems a little murkier.
Deprecating extra_dejson would be straightforward. But how would you
deprecate extra as string, while enabling folks to migrate to extra as dict
in place... Maybe make it a `str` subclass that implements `get`?
Something like this:
class ExtraDict(str):
def __getitem__(self, item):
return json.loads(self)[item]
def get(self, item, default=None):
return json.loads(self).get(item, default)
So then it behaves mostly like a string, but can also be accessed like a
dict?
What do you think?
> Should we provide some way of validating existing connections so users
can check this before upgrading to 3.0?
Sounds like a good idea.
Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
Posted by Josh Fell <jo...@astronomer.io.INVALID>.
+1
Extra is required to be a JSON when connections are via in the UI now (a
big hairy error message is displayed if it isn't) so hopefully that reduces
blast radius for existing connections a bit.
On Fri, Feb 25, 2022 at 7:41 AM Kaxil Naik <ka...@gmail.com> wrote:
> +1
>
> On Fri, 25 Feb 2022 at 12:35, Tomasz Urbaszek <tu...@gmail.com> wrote:
>
>> +1 for this idea.
>>
>> Should we provide some way of validating existing connections so users
>> can check this before upgrading to 3.0?
>>
>> Thanks,
>> Tomek
>>
>> On Fri, 25 Feb 2022 at 10:14, Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>>> I'm all for this.
>>>
>>> What does this mean for the extra and extra_dejson attrs that exist on
>>> Connection right now?
>>>
>>> -a
>>>
>>> On Thu, Feb 24 2022 at 23:23:20 -0800, Daniel Standish
>>> <da...@astronomer.io.INVALID> wrote:
>>>
>>> It's generally assumed that the `extra` field in airflow's Connection
>>> model is JSON string. However, it's not, strictly speaking, *required*
>>> to be so.
>>>
>>> I believe we should require it to be JSON.
>>>
>>> But I also think we should nudge this a tiny bit further. A python
>>> string value such as '"hi"' contains a valid json string "hi". And
>>> similarly the string '[0,2,3]' is _also_ a valid string. But this is not
>>> at all what is intended for `extra` and, I think for pretty obvious
>>> reasons, a bad idea. So I think we should _also_ require that the value
>>> for `extra`, if provided, must be json that parses as a python _dict_.
>>>
>>> So, to summarize, the proposal is, from release 3.0, require that conn
>>> `extra` be json (or None) and require that the json (if provided) must
>>> parse as a dict.
>>>
>>> PR to implement deprecation as prescribed by the proposal is here
>>> <https://github.com/apache/airflow/pull/21816>.
>>>
>>> This vote will run until Tuesday at 8am UTC (three full weekdays).
>>>
>>> Thanks for your consideration.
>>>
>>>
Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
Posted by Kaxil Naik <ka...@gmail.com>.
+1
On Fri, 25 Feb 2022 at 12:35, Tomasz Urbaszek <tu...@gmail.com> wrote:
> +1 for this idea.
>
> Should we provide some way of validating existing connections so users can
> check this before upgrading to 3.0?
>
> Thanks,
> Tomek
>
> On Fri, 25 Feb 2022 at 10:14, Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> I'm all for this.
>>
>> What does this mean for the extra and extra_dejson attrs that exist on
>> Connection right now?
>>
>> -a
>>
>> On Thu, Feb 24 2022 at 23:23:20 -0800, Daniel Standish
>> <da...@astronomer.io.INVALID> wrote:
>>
>> It's generally assumed that the `extra` field in airflow's Connection
>> model is JSON string. However, it's not, strictly speaking, *required*
>> to be so.
>>
>> I believe we should require it to be JSON.
>>
>> But I also think we should nudge this a tiny bit further. A python
>> string value such as '"hi"' contains a valid json string "hi". And
>> similarly the string '[0,2,3]' is _also_ a valid string. But this is not
>> at all what is intended for `extra` and, I think for pretty obvious
>> reasons, a bad idea. So I think we should _also_ require that the value
>> for `extra`, if provided, must be json that parses as a python _dict_.
>>
>> So, to summarize, the proposal is, from release 3.0, require that conn
>> `extra` be json (or None) and require that the json (if provided) must
>> parse as a dict.
>>
>> PR to implement deprecation as prescribed by the proposal is here
>> <https://github.com/apache/airflow/pull/21816>.
>>
>> This vote will run until Tuesday at 8am UTC (three full weekdays).
>>
>> Thanks for your consideration.
>>
>>
Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
Posted by Tomasz Urbaszek <tu...@gmail.com>.
+1 for this idea.
Should we provide some way of validating existing connections so users can
check this before upgrading to 3.0?
Thanks,
Tomek
On Fri, 25 Feb 2022 at 10:14, Ash Berlin-Taylor <as...@apache.org> wrote:
> I'm all for this.
>
> What does this mean for the extra and extra_dejson attrs that exist on
> Connection right now?
>
> -a
>
> On Thu, Feb 24 2022 at 23:23:20 -0800, Daniel Standish
> <da...@astronomer.io.INVALID> wrote:
>
> It's generally assumed that the `extra` field in airflow's Connection
> model is JSON string. However, it's not, strictly speaking, *required*
> to be so.
>
> I believe we should require it to be JSON.
>
> But I also think we should nudge this a tiny bit further. A python string
> value such as '"hi"' contains a valid json string "hi". And similarly the
> string '[0,2,3]' is _also_ a valid string. But this is not at all what is
> intended for `extra` and, I think for pretty obvious reasons, a bad idea.
> So I think we should _also_ require that the value for `extra`, if
> provided, must be json that parses as a python _dict_.
>
> So, to summarize, the proposal is, from release 3.0, require that conn
> `extra` be json (or None) and require that the json (if provided) must
> parse as a dict.
>
> PR to implement deprecation as prescribed by the proposal is here
> <https://github.com/apache/airflow/pull/21816>.
>
> This vote will run until Tuesday at 8am UTC (three full weekdays).
>
> Thanks for your consideration.
>
>
Re: [LAZY CONSENSUS] Deprecate non-JSON extra, and require a dict
Posted by Ash Berlin-Taylor <as...@apache.org>.
I'm all for this.
What does this mean for the extra and extra_dejson attrs that exist on
Connection right now?
-a
On Thu, Feb 24 2022 at 23:23:20 -0800, Daniel Standish
<da...@astronomer.io.INVALID> wrote:
> It's generally assumed that the `extra` field in airflow's Connection
> model is JSON string. However, it's not, strictly speaking,
> /required/ to be so.
>
> I believe we should require it to be JSON.
>
> But I also think we should nudge this a tiny bit further. A python
> string value such as '"hi"' contains a valid json string "hi". And
> similarly the string '[0,2,3]' is _also_ a valid string. But this is
> not at all what is intended for `extra` and, I think for pretty
> obvious reasons, a bad idea. So I think we should _also_ require
> that the value for `extra`, if provided, must be json that parses as
> a python _dict_.
>
> So, to summarize, the proposal is, from release 3.0, require that
> conn `extra` be json (or None) and require that the json (if
> provided) must parse as a dict.
>
> PR to implement deprecation as prescribed by the proposal is here
> <https://github.com/apache/airflow/pull/21816>.
>
> This vote will run until Tuesday at 8am UTC (three full weekdays).
>
> Thanks for your consideration.