You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Ash Berlin-Taylor <as...@apache.org> on 2022/08/02 10:43:06 UTC

Auto-registering of DAGs in DAG file? (no `as dag` needed?)

Hello all,

I'm on a bit of a kick thinking about developer (specifically DAG 
author) experience  and if there is anything we can

Some time ago there was a previous conversation about if we 
should/could "autoregister" DAGs, rather than just looking at the 
objects in the top level (globals()) of a file, an I knocked up this PR

<https://github.com/apache/airflow/pull/23592>

The question I have for you all is do we think this is  good idea? It 
does somewhat subtly change the behaviour in a few cases. Lets take 
this example this from the docs 
<https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#loading-dags>

dag_1 = DAG('this_dag_will_be_discovered')

def my_function():
    dag_2 = DAG('but_this_dag_will_not')

my_function()

As implemented right now the second dag won't get picked up (as the 
auto registration is handled in the context manager, but if the example 
was changed to use a context manager it will get loaded/discovered:

with DAG('this_dag_will_be_discovered'):
    EmptyOperator(task_id='task')


def my_function():
    with DAG('so_will_this_dag_now'):
        EmptyOperator(task_id='task')

my_function()

With the change in my PR both DAGs would be picked up.Does that count 
as a breaking change do you think? Is this behaviour more helpful to 
users, or do we think it would be confusing?

(If I get a few thumbs up I will update the docs in my PR to cover this 
new behaviour.)

-ash



Re: Auto-registering of DAGs in DAG file? (no `as dag` needed?)

Posted by Bas Harenslak <ba...@astronomer.io.INVALID>.
For me having to add “as dag” is somewhat non-evident. Also, it’s a few less boilerplate characters you have to write which I think is a good thing in this case.

Curious on the edge cases in DAG generation as mentioned by Felix, but I like the idea so +1 for me.

Bas

> On 2 Aug 2022, at 13:18, Felix Uellendall <fe...@pm.me.INVALID> wrote:
> 
> Hey Ash,
> 
> I personally don't like it, because it is not obvious to me. 
> 
> Also what happens if you return the `dag_2` variable and set the return value in the global context to `dag_2` as well? This is how I used to do it when generating DAGs - and in my opinion this is pythonic way of doing it without any magic. I mean magic is nice as long as it works..
> 
> Keep in mind that also some people use functions to hide the dag creation i.e. factory pattern to clearly separate it from callers context (e.g. business logic). Your solution would blurry this line.
> 
> So I am leaning towards a "No", but keen to know what others think :)
> 
> Best,
> Felix
> 
> 
> Sent with Proton Mail <https://proton.me/> secure email.
> 
> ------- Original Message -------
> On Tuesday, August 2nd, 2022 at 12:43, Ash Berlin-Taylor <as...@apache.org> wrote:
> 
>> Hello all,
>> 
>> I'm on a bit of a kick thinking about developer (specifically DAG author) experience  and if there is anything we can 
>> 
>> Some time ago there was a previous conversation about if we should/could "autoregister" DAGs, rather than just looking at the objects in the top level (globals()) of a file, an I knocked up this PR
>> 
>> https://github.com/apache/airflow/pull/23592 <https://github.com/apache/airflow/pull/23592>
>> 
>> The question I have for you all is do we think this is  good idea? It does somewhat subtly change the behaviour in a few cases. Lets take this example this from the docs https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#loading-dags <https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#loading-dags>
>> 
>> dag_1 = DAG('this_dag_will_be_discovered')
>> 
>> def my_function():
>>     dag_2 = DAG('but_this_dag_will_not')
>>  
>> my_function()
>> 
>> As implemented right now the second dag won't get picked up (as the auto registration is handled in the context manager, but if the example was changed to use a context manager it will get loaded/discovered:
>> 
>> with DAG('this_dag_will_be_discovered'):
>>     EmptyOperator(task_id='task')
>> 
>> 
>> def my_function():
>>     with DAG('so_will_this_dag_now'):
>>         EmptyOperator(task_id='task')
>>  
>> my_function()
>> 
>> With the change in my PR both DAGs would be picked up. Does that count as a breaking change do you think? Is this behaviour more helpful to users, or do we think it would be confusing?
>> 
>> (If I get a few thumbs up I will update the docs in my PR to cover this new behaviour.)
>> 
>> -ash 
>> 
> 


Re: Auto-registering of DAGs in DAG file? (no `as dag` needed?)

Posted by Jarek Potiuk <ja...@potiuk.com>.
Less is more. I like the lack of "as dag".

I think it's not really a breaking change.

We can easily argue we are adding functionality rather than introducing a
breaking change. I think the only reason why someone would create a DAG in
a function is to return it and eventually add it to globals(). That's what
also a number of guides on the internet will tell people to do.

And if they did that, that is no-change for them - the DAG will be added to
globals automatically, adding it again is a no-op.

Surely it needs an explanation, something along the lines "It used to be
needed to add to globals, now it's not. If you actually created a DAG and
did not add it to globals, then you really have to rethink what and why you
are doing it :).

J..



On Tue, Aug 2, 2022 at 2:09 PM Felix Uellendall <fe...@pm.me.invalid>
wrote:

> Ah I should have checked your PR, sorry. I was looking at the first
> example. In general I like the idea of removing the `as dag` in the context
> manager syntax.
>
> Best,
> Felix
>
>
>
> Sent with Proton Mail <https://proton.me/> secure email.
>
> ------- Original Message -------
> On Tuesday, August 2nd, 2022 at 13:29, Pankaj Koti <
> pankajkoti699@gmail.com> wrote:
>
> Will this impact DAG file processing time?
>
> If we consider to include the change, we might also need to consider
> informing the user that such functions need to be lightweight inline with
> what we've here for top-level-code best practices:
> https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code
>
> On Tue, 2 Aug 2022 at 16:48, Felix Uellendall <fe...@pm.me.invalid>
> wrote:
>
>> Hey Ash,
>>
>> I personally don't like it, because it is not obvious to me.
>>
>> Also what happens if you return the `dag_2` variable and set the return
>> value in the global context to `dag_2` as well? This is how I used to do it
>> when generating DAGs - and in my opinion this is pythonic way of doing it
>> without any magic. I mean magic is nice as long as it works..
>>
>> Keep in mind that also some people use functions to hide the dag creation
>> i.e. factory pattern to clearly separate it from callers context (e.g.
>> business logic). Your solution would blurry this line.
>>
>> So I am leaning towards a "No", but keen to know what others think :)
>>
>> Best,
>> Felix
>>
>>
>> Sent with Proton Mail <https://proton.me/> secure email.
>>
>> ------- Original Message -------
>> On Tuesday, August 2nd, 2022 at 12:43, Ash Berlin-Taylor <as...@apache.org>
>> wrote:
>>
>> Hello all,
>>
>> I'm on a bit of a kick thinking about developer (specifically DAG author)
>> experience and if there is anything we can
>>
>> Some time ago there was a previous conversation about if we should/could
>> "autoregister" DAGs, rather than just looking at the objects in the top
>> level (globals()) of a file, an I knocked up this PR
>>
>> https://github.com/apache/airflow/pull/23592
>>
>> The question I have for you all is do we think this is good idea? It does
>> somewhat subtly change the behaviour in a few cases. Lets take this example
>> this from the docs
>> https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#loading-dags
>>
>> dag_1 = DAG('this_dag_will_be_discovered')
>> def my_function():
>>     dag_2 = DAG('but_this_dag_will_not')
>>
>> my_function()
>>
>> As implemented right now the second dag won't get picked up (as the auto
>> registration is handled in the context manager, but if the example was
>> changed to use a context manager it will get loaded/discovered:
>>
>> with DAG('this_dag_will_be_discovered'):
>>
>>     EmptyOperator(task_id='task')
>>
>>
>> def my_function(): with DAG('so_will_this_dag_now'):
>>
>>         EmptyOperator(task_id='task')
>>
>>
>> my_function()
>>
>> With the change in my PR both DAGs would be picked up. Does that count
>> as a breaking change do you think? Is this behaviour more helpful to users,
>> or do we think it would be confusing?
>>
>> (If I get a few thumbs up I will update the docs in my PR to cover this
>> new behaviour.)
>>
>> -ash
>>
>>
>>
>
> --
> Best regards,
> Pankaj Koti
> +91 97300 79985
>
>
>

Re: Auto-registering of DAGs in DAG file? (no `as dag` needed?)

Posted by Felix Uellendall <fe...@pm.me.INVALID>.
Ah I should have checked your PR, sorry. I was looking at the first example. In general I like the idea of removing the `as dag` in the context manager syntax.

Best,Felix

Sent with [Proton Mail](https://proton.me/) secure email.

------- Original Message -------
On Tuesday, August 2nd, 2022 at 13:29, Pankaj Koti <pa...@gmail.com> wrote:

> Will this impact DAG file processing time?
>
> If we consider to include the change, we might also need to consider informing the user that such functions need to be lightweight inline with what we've here for top-level-code best practices: https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code
>
> On Tue, 2 Aug 2022 at 16:48, Felix Uellendall <fe...@pm.me.invalid> wrote:
>
>> Hey Ash,
>>
>> I personally don't like it, because it is not obvious to me.
>>
>> Also what happens if you return the `dag_2` variable and set the return value in the global context to `dag_2` as well? This is how I used to do it when generating DAGs - and in my opinion this is pythonic way of doing it without any magic. I mean magic is nice as long as it works..
>>
>> Keep in mind that also some people use functions to hide the dag creation i.e. factory pattern to clearly separate it from callers context (e.g. business logic). Your solution would blurry this line.
>>
>> So I am leaning towards a "No", but keen to know what others think :)
>>
>> Best,
>> Felix
>>
>> Sent with [Proton Mail](https://proton.me/) secure email.
>>
>> ------- Original Message -------
>> On Tuesday, August 2nd, 2022 at 12:43, Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>>> Hello all,
>>>
>>> I'm on a bit of a kick thinking about developer (specifically DAG author) experience and if there is anything we can
>>>
>>> Some time ago there was a previous conversation about if we should/could "autoregister" DAGs, rather than just looking at the objects in the top level (globals()) of a file, an I knocked up this PR
>>>
>>> https://github.com/apache/airflow/pull/23592
>>>
>>> The question I have for you all is do we think this is good idea? It does somewhat subtly change the behaviour in a few cases. Lets take this example this from the docs https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#loading-dags
>>>
>>> dag_1
>>>
>>> =
>>>
>>> DAG
>>>
>>> (
>>>
>>> 'this_dag_will_be_discovered'
>>>
>>> )
>>>
>>> def
>>>
>>> my_function
>>>
>>> ():
>>>
>>> dag_2
>>>
>>> =
>>>
>>> DAG
>>>
>>> (
>>>
>>> 'but_this_dag_will_not'
>>>
>>> )
>>>
>>> my_function()
>>>
>>> As implemented right now the second dag won't get picked up (as the auto registration is handled in the context manager, but if the example was changed to use a context manager it will get loaded/discovered:
>>>
>>> with
>>>
>>> DAG
>>>
>>> (
>>>
>>> 'this_dag_will_be_discovered'
>>>
>>> ):
>>>
>>> EmptyOperator(task_id='task')
>>>
>>> def
>>>
>>> my_function
>>>
>>> ():
>>>
>>> with
>>>
>>> DAG
>>>
>>> (
>>>
>>> 'so_will_this_dag_now'
>>>
>>> ):
>>>
>>> EmptyOperator(task_id='task')
>>>
>>> my_function()
>>>
>>> With the change in my PR both DAGs would be picked up. Does that count as a breaking change do you think? Is this behaviour more helpful to users, or do we think it would be confusing?
>>>
>>> (If I get a few thumbs up I will update the docs in my PR to cover this new behaviour.)
>>>
>>> -ash
>
> --
>
> Best regards,
> Pankaj Koti
> +91 97300 79985

Re: Auto-registering of DAGs in DAG file? (no `as dag` needed?)

Posted by Pankaj Koti <pa...@gmail.com>.
Will this impact DAG file processing time?

If we consider to include the change, we might also need to consider
informing the user that such functions need to be lightweight inline with
what we've here for top-level-code best practices:
https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code

On Tue, 2 Aug 2022 at 16:48, Felix Uellendall <fe...@pm.me.invalid>
wrote:

> Hey Ash,
>
> I personally don't like it, because it is not obvious to me.
>
> Also what happens if you return the `dag_2` variable and set the return
> value in the global context to `dag_2` as well? This is how I used to do it
> when generating DAGs - and in my opinion this is pythonic way of doing it
> without any magic. I mean magic is nice as long as it works..
>
> Keep in mind that also some people use functions to hide the dag creation
> i.e. factory pattern to clearly separate it from callers context (e.g.
> business logic). Your solution would blurry this line.
>
> So I am leaning towards a "No", but keen to know what others think :)
>
> Best,
> Felix
>
>
> Sent with Proton Mail <https://proton.me/> secure email.
>
> ------- Original Message -------
> On Tuesday, August 2nd, 2022 at 12:43, Ash Berlin-Taylor <as...@apache.org>
> wrote:
>
> Hello all,
>
> I'm on a bit of a kick thinking about developer (specifically DAG author)
> experience  and if there is anything we can
>
> Some time ago there was a previous conversation about if we should/could
> "autoregister" DAGs, rather than just looking at the objects in the top
> level (globals()) of a file, an I knocked up this PR
>
> https://github.com/apache/airflow/pull/23592
>
> The question I have for you all is do we think this is  good idea? It does
> somewhat subtly change the behaviour in a few cases. Lets take this example
> this from the docs
> https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#loading-dags
>
> dag_1 = DAG('this_dag_will_be_discovered')
> def my_function():
>     dag_2 = DAG('but_this_dag_will_not')
>
>
> my_function()
>
> As implemented right now the second dag won't get picked up (as the auto
> registration is handled in the context manager, but if the example was
> changed to use a context manager it will get loaded/discovered:
>
> with DAG('this_dag_will_be_discovered'):
>
>     EmptyOperator(task_id='task')
>
>
> def my_function(): with DAG('so_will_this_dag_now'):
>
>         EmptyOperator(task_id='task')
>
>
> my_function()
>
> With the change in my PR both DAGs would be picked up. Does that count as
> a breaking change do you think? Is this behaviour more helpful to users, or
> do we think it would be confusing?
>
> (If I get a few thumbs up I will update the docs in my PR to cover this
> new behaviour.)
>
> -ash
>
>
>

-- 
Best regards,
Pankaj Koti
+91 97300 79985

Re: Auto-registering of DAGs in DAG file? (no `as dag` needed?)

Posted by Felix Uellendall <fe...@pm.me.INVALID>.
Hey Ash,

I personally don't like it, because it is not obvious to me.

Also what happens if you return the `dag_2` variable and set the return value in the global context to `dag_2` as well? This is how I used to do it when generating DAGs - and in my opinion this is pythonic way of doing it without any magic. I mean magic is nice as long as it works..

Keep in mind that also some people use functions to hide the dag creation i.e. factory pattern to clearly separate it from callers context (e.g. business logic). Your solution would blurry this line.

So I am leaning towards a "No", but keen to know what others think :)

Best,
Felix

Sent with [Proton Mail](https://proton.me/) secure email.

------- Original Message -------
On Tuesday, August 2nd, 2022 at 12:43, Ash Berlin-Taylor <as...@apache.org> wrote:

> Hello all,
>
> I'm on a bit of a kick thinking about developer (specifically DAG author) experience and if there is anything we can
>
> Some time ago there was a previous conversation about if we should/could "autoregister" DAGs, rather than just looking at the objects in the top level (globals()) of a file, an I knocked up this PR
>
> https://github.com/apache/airflow/pull/23592
>
> The question I have for you all is do we think this is good idea? It does somewhat subtly change the behaviour in a few cases. Lets take this example this from the docs https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#loading-dags
>
> dag_1
>
> =
>
> DAG
>
> (
>
> 'this_dag_will_be_discovered'
>
> )
>
> def
>
> my_function
>
> ():
>
> dag_2
>
> =
>
> DAG
>
> (
>
> 'but_this_dag_will_not'
>
> )
>
> my_function()
>
> As implemented right now the second dag won't get picked up (as the auto registration is handled in the context manager, but if the example was changed to use a context manager it will get loaded/discovered:
>
> with
>
> DAG
>
> (
>
> 'this_dag_will_be_discovered'
>
> ):
>
> EmptyOperator(task_id='task')
>
> def
>
> my_function
>
> ():
>
> with
>
> DAG
>
> (
>
> 'so_will_this_dag_now'
>
> ):
>
> EmptyOperator(task_id='task')
>
> my_function()
>
> With the change in my PR both DAGs would be picked up. Does that count as a breaking change do you think? Is this behaviour more helpful to users, or do we think it would be confusing?
>
> (If I get a few thumbs up I will update the docs in my PR to cover this new behaviour.)
>
> -ash

Re: Auto-registering of DAGs in DAG file? (no `as dag` needed?)

Posted by Vikram Koka <vi...@astronomer.io.INVALID>.
+1
Similar view to Jed. I view this as adding a feature, rather than
breaking compatibility.


On Thu, Aug 4, 2022 at 3:20 PM Jed Cunningham <je...@apache.org>
wrote:

> +1. I view it as adding a feature vs breaking compatibility.
>
> On Thu, Aug 4, 2022 at 4:15 PM Ferruzzi, Dennis
> <fe...@amazon.com.invalid> wrote:
>
>> I definitely like it, I love reducing boilerplate code like that.
>>
>>
>> ------------------------------
>> *From:* Ash Berlin-Taylor <as...@apache.org>
>> *Sent:* Tuesday, August 2, 2022 3:43 AM
>> *To:* dev@airflow.apache.org
>> *Subject:* [EXTERNAL] Auto-registering of DAGs in DAG file? (no `as dag`
>> needed?)
>>
>>
>> *CAUTION*: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender and
>> know the content is safe.
>>
>> Hello all,
>>
>> I'm on a bit of a kick thinking about developer (specifically DAG author)
>> experience  and if there is anything we can
>>
>> Some time ago there was a previous conversation about if we should/could
>> "autoregister" DAGs, rather than just looking at the objects in the top
>> level (globals()) of a file, an I knocked up this PR
>>
>> https://github.com/apache/airflow/pull/23592
>>
>> The question I have for you all is do we think this is  good idea? It
>> does somewhat subtly change the behaviour in a few cases. Lets take this
>> example this from the docs
>> https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#loading-dags
>>
>> dag_1 = DAG('this_dag_will_be_discovered')
>> def my_function():
>>     dag_2 = DAG('but_this_dag_will_not')
>>
>>
>> my_function()
>>
>> As implemented right now the second dag won't get picked up (as the auto
>> registration is handled in the context manager, but if the example was
>> changed to use a context manager it will get loaded/discovered:
>>
>> with DAG('this_dag_will_be_discovered'):
>>
>>     EmptyOperator(task_id='task')
>>
>>
>> def my_function(): with DAG('so_will_this_dag_now'):
>>
>>         EmptyOperator(task_id='task')
>>
>>
>> my_function()
>>
>> With the change in my PR both DAGs would be picked up. Does that count
>> as a breaking change do you think? Is this behaviour more helpful to users,
>> or do we think it would be confusing?
>>
>> (If I get a few thumbs up I will update the docs in my PR to cover this
>> new behaviour.)
>>
>> -ash
>>
>>

Re: Auto-registering of DAGs in DAG file? (no `as dag` needed?)

Posted by Jed Cunningham <je...@apache.org>.
+1. I view it as adding a feature vs breaking compatibility.

On Thu, Aug 4, 2022 at 4:15 PM Ferruzzi, Dennis <fe...@amazon.com.invalid>
wrote:

> I definitely like it, I love reducing boilerplate code like that.
>
>
> ------------------------------
> *From:* Ash Berlin-Taylor <as...@apache.org>
> *Sent:* Tuesday, August 2, 2022 3:43 AM
> *To:* dev@airflow.apache.org
> *Subject:* [EXTERNAL] Auto-registering of DAGs in DAG file? (no `as dag`
> needed?)
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
> Hello all,
>
> I'm on a bit of a kick thinking about developer (specifically DAG author)
> experience  and if there is anything we can
>
> Some time ago there was a previous conversation about if we should/could
> "autoregister" DAGs, rather than just looking at the objects in the top
> level (globals()) of a file, an I knocked up this PR
>
> https://github.com/apache/airflow/pull/23592
>
> The question I have for you all is do we think this is  good idea? It does
> somewhat subtly change the behaviour in a few cases. Lets take this example
> this from the docs
> https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#loading-dags
>
> dag_1 = DAG('this_dag_will_be_discovered')
> def my_function():
>     dag_2 = DAG('but_this_dag_will_not')
>
>
> my_function()
>
> As implemented right now the second dag won't get picked up (as the auto
> registration is handled in the context manager, but if the example was
> changed to use a context manager it will get loaded/discovered:
>
> with DAG('this_dag_will_be_discovered'):
>
>     EmptyOperator(task_id='task')
>
>
> def my_function(): with DAG('so_will_this_dag_now'):
>
>         EmptyOperator(task_id='task')
>
>
> my_function()
>
> With the change in my PR both DAGs would be picked up. Does that count as
> a breaking change do you think? Is this behaviour more helpful to users, or
> do we think it would be confusing?
>
> (If I get a few thumbs up I will update the docs in my PR to cover this
> new behaviour.)
>
> -ash
>
>

Re: Auto-registering of DAGs in DAG file? (no `as dag` needed?)

Posted by "Ferruzzi, Dennis" <fe...@amazon.com.INVALID>.
I definitely like it, I love reducing boilerplate code like that.


________________________________
From: Ash Berlin-Taylor <as...@apache.org>
Sent: Tuesday, August 2, 2022 3:43 AM
To: dev@airflow.apache.org
Subject: [EXTERNAL] Auto-registering of DAGs in DAG file? (no `as dag` needed?)


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


Hello all,

I'm on a bit of a kick thinking about developer (specifically DAG author) experience  and if there is anything we can

Some time ago there was a previous conversation about if we should/could "autoregister" DAGs, rather than just looking at the objects in the top level (globals()) of a file, an I knocked up this PR

https://github.com/apache/airflow/pull/23592

The question I have for you all is do we think this is  good idea? It does somewhat subtly change the behaviour in a few cases. Lets take this example this from the docs https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#loading-dags


dag_1 = DAG('this_dag_will_be_discovered')

def my_function():
    dag_2 = DAG('but_this_dag_will_not')


my_function()

As implemented right now the second dag won't get picked up (as the auto registration is handled in the context manager, but if the example was changed to use a context manager it will get loaded/discovered:


with DAG('this_dag_will_be_discovered'):

    EmptyOperator(task_id='task')


def my_function(): with DAG('so_will_this_dag_now'):

        EmptyOperator(task_id='task')


my_function()

With the change in my PR both DAGs would be picked up. Does that count as a breaking change do you think? Is this behaviour more helpful to users, or do we think it would be confusing?

(If I get a few thumbs up I will update the docs in my PR to cover this new behaviour.)

-ash