You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Khalid Mammadov <kh...@gmail.com> on 2021/10/03 21:21:53 UTC

DAG public API en-query

Hi Devs,


I was reviewing DAG class and noticed that almost all it's methods are 
public.

So, one can do something like below:

with DAG(...)as dag:
t1 = BashOperator(...)

     
run_id = DagRun.generate_run_id(DagRunType.MANUAL, datetime.utcnow())
     
     
     # This one works OK and create a DagRun for the Scheduler to pick up
     dag.create_dagrun(state=DagRunState.QUEUED, run_id=run_id)

     # OR EVEN DO BELOW - which caused my laptop to run on 100% CPU
     #dag.run()

And I was wondering if this is intentional and/or expected behavior?


Thanks,

Khalid


Re: DAG public API en-query

Posted by Khalid Mammadov <kh...@gmail.com>.
Thanks Jarek for clarification and information. I will create specific PRs
and we can discuss there. If I come up with bigger change idea then I will
file AIP.

On Mon, 4 Oct 2021, 13:27 Jarek Potiuk, <ja...@potiuk.com> wrote:

> I think you need to avoid the "generalizations" and dive into which
> particular methods do you have in mind.  I think if you pin-point some
> methods that are indeed "internal" and "protected" - they could be
> deprecated, and later removed and if you can find some, I'd encourage you
> to PR those.
>
> There are already some methods in DAG that are protected or "internal" .
> But most of the methods I could see there are "meant" to be publicly
> available - DAG is the public interface Airflow exposed to manage
> all-things-dag, so the vast majority of those methods is
> intentionally public and should be used by whoever uses DAG as an object. I
> do not know all the methods by-heart, so maybe there are some that are
> really "internal" though.
>
> Also even if there are some methods already "public" there
> 'unintentionally", we have to be really careful not to rename/hide such
> methods because someone, somewhere could have used them already and we can
> break someone's DAGs. This is pretty much the public API of Airflow - and
> the most important one - used by everyone who uses Airflow so changing that
> one should be really, really, really careful.
>
> But if you can pinpoint some methods that are clearly private and we all
> agree that it is very likely to be  made private - we can deprecate such
> methods and remove them in Airflow 3.0 (which will be ~ 6/12 months from
> now).
> We have the rule that we cannot remove stuff from the "Public" part of
> Airflow as we are strictly following Semver, so our approach is that we
> deprecate such methods/classes and remove them in the next "major" version.
>
> If you can review the code and clearly pin-point those individual methods
> that could be made private, then simply create PRs which deprecates such
> methods (Ideally one-per-method) and we can likely discuss if it makes or
> does not make sense to make those methods internal/protected.
>
> J.
>
>
> On Mon, Oct 4, 2021 at 10:12 AM Khalid Mammadov <kh...@gmail.com>
> wrote:
>
>> Another thing I was thinking that to move out those methods altogether to
>> a separate class or module and so execution of a dag or those methods is
>> done from different place and then Dag class will be very slim and will
>> only be used to describe definition of a Dag.
>>
>> On Mon, 4 Oct 2021, 08:33 Khalid Mammadov, <kh...@gmail.com>
>> wrote:
>>
>>> I was thinking to follow as you mentioned the very few Pythonic
>>> notations for private methods i.e. just add an underscore in front of it
>>> this example to signal a user/dev this is private and an internal
>>> implementation and not part of public API or otherwise it's.
>>>
>>> On Sun, 3 Oct 2021, 23:50 Jarek Potiuk, <ja...@potiuk.com> wrote:
>>>
>>>> Well. Not sure what else you'd expect, I wonder ?
>>>>
>>>> This is for sure unexpected and not reasonable use of it. There are
>>>> best practices to follow and things to avoid when you run top-level python
>>>> code
>>>> https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code
>>>> but if you want you can do anything.
>>>> This is Python, we can't prevent you from doing pretty much
>>>> anything with it if you want. There are no "truly" private methods in
>>>> Python. Also you can do that:
>>>>
>>>> ```
>>>> while True:
>>>>     pass
>>>> ```
>>>>
>>>> and you will get your CPU at 100% too,
>>>>
>>>> J,.
>>>>
>>>> On Sun, Oct 3, 2021 at 11:22 PM Khalid Mammadov <
>>>> khalidmammadov9@gmail.com> wrote:
>>>>
>>>>> Hi Devs,
>>>>>
>>>>>
>>>>> I was reviewing DAG class and noticed that almost all it's methods are
>>>>> public.
>>>>>
>>>>> So, one can do something like below:
>>>>>
>>>>> with DAG(...) as dag:    t1 = BashOperator(...)
>>>>>
>>>>>         run_id = DagRun.generate_run_id(DagRunType.MANUAL, datetime.utcnow())
>>>>>
>>>>>
>>>>>     # This one works OK and create a DagRun for the Scheduler to pick up
>>>>>     dag.create_dagrun(state=DagRunState.QUEUED, run_id=run_id)
>>>>>
>>>>>     # OR EVEN DO BELOW - which caused my laptop to run on 100% CPU
>>>>>     # dag.run()
>>>>>
>>>>> And I was wondering if this is intentional and/or expected behavior?
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Khalid
>>>>>
>>>>

Re: DAG public API en-query

Posted by Jarek Potiuk <ja...@potiuk.com>.
I think you need to avoid the "generalizations" and dive into which
particular methods do you have in mind.  I think if you pin-point some
methods that are indeed "internal" and "protected" - they could be
deprecated, and later removed and if you can find some, I'd encourage you
to PR those.

There are already some methods in DAG that are protected or "internal" .
But most of the methods I could see there are "meant" to be publicly
available - DAG is the public interface Airflow exposed to manage
all-things-dag, so the vast majority of those methods is
intentionally public and should be used by whoever uses DAG as an object. I
do not know all the methods by-heart, so maybe there are some that are
really "internal" though.

Also even if there are some methods already "public" there
'unintentionally", we have to be really careful not to rename/hide such
methods because someone, somewhere could have used them already and we can
break someone's DAGs. This is pretty much the public API of Airflow - and
the most important one - used by everyone who uses Airflow so changing that
one should be really, really, really careful.

But if you can pinpoint some methods that are clearly private and we all
agree that it is very likely to be  made private - we can deprecate such
methods and remove them in Airflow 3.0 (which will be ~ 6/12 months from
now).
We have the rule that we cannot remove stuff from the "Public" part of
Airflow as we are strictly following Semver, so our approach is that we
deprecate such methods/classes and remove them in the next "major" version.

If you can review the code and clearly pin-point those individual methods
that could be made private, then simply create PRs which deprecates such
methods (Ideally one-per-method) and we can likely discuss if it makes or
does not make sense to make those methods internal/protected.

J.


On Mon, Oct 4, 2021 at 10:12 AM Khalid Mammadov <kh...@gmail.com>
wrote:

> Another thing I was thinking that to move out those methods altogether to
> a separate class or module and so execution of a dag or those methods is
> done from different place and then Dag class will be very slim and will
> only be used to describe definition of a Dag.
>
> On Mon, 4 Oct 2021, 08:33 Khalid Mammadov, <kh...@gmail.com>
> wrote:
>
>> I was thinking to follow as you mentioned the very few Pythonic notations
>> for private methods i.e. just add an underscore in front of it this example
>> to signal a user/dev this is private and an internal implementation and not
>> part of public API or otherwise it's.
>>
>> On Sun, 3 Oct 2021, 23:50 Jarek Potiuk, <ja...@potiuk.com> wrote:
>>
>>> Well. Not sure what else you'd expect, I wonder ?
>>>
>>> This is for sure unexpected and not reasonable use of it. There are best
>>> practices to follow and things to avoid when you run top-level python code
>>> https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code
>>> but if you want you can do anything.
>>> This is Python, we can't prevent you from doing pretty much
>>> anything with it if you want. There are no "truly" private methods in
>>> Python. Also you can do that:
>>>
>>> ```
>>> while True:
>>>     pass
>>> ```
>>>
>>> and you will get your CPU at 100% too,
>>>
>>> J,.
>>>
>>> On Sun, Oct 3, 2021 at 11:22 PM Khalid Mammadov <
>>> khalidmammadov9@gmail.com> wrote:
>>>
>>>> Hi Devs,
>>>>
>>>>
>>>> I was reviewing DAG class and noticed that almost all it's methods are
>>>> public.
>>>>
>>>> So, one can do something like below:
>>>>
>>>> with DAG(...) as dag:    t1 = BashOperator(...)
>>>>
>>>>         run_id = DagRun.generate_run_id(DagRunType.MANUAL, datetime.utcnow())
>>>>
>>>>
>>>>     # This one works OK and create a DagRun for the Scheduler to pick up
>>>>     dag.create_dagrun(state=DagRunState.QUEUED, run_id=run_id)
>>>>
>>>>     # OR EVEN DO BELOW - which caused my laptop to run on 100% CPU
>>>>     # dag.run()
>>>>
>>>> And I was wondering if this is intentional and/or expected behavior?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Khalid
>>>>
>>>

Re: DAG public API en-query

Posted by Khalid Mammadov <kh...@gmail.com>.
Another thing I was thinking that to move out those methods altogether to a
separate class or module and so execution of a dag or those methods is done
from different place and then Dag class will be very slim and will only be
used to describe definition of a Dag.

On Mon, 4 Oct 2021, 08:33 Khalid Mammadov, <kh...@gmail.com>
wrote:

> I was thinking to follow as you mentioned the very few Pythonic notations
> for private methods i.e. just add an underscore in front of it this example
> to signal a user/dev this is private and an internal implementation and not
> part of public API or otherwise it's.
>
> On Sun, 3 Oct 2021, 23:50 Jarek Potiuk, <ja...@potiuk.com> wrote:
>
>> Well. Not sure what else you'd expect, I wonder ?
>>
>> This is for sure unexpected and not reasonable use of it. There are best
>> practices to follow and things to avoid when you run top-level python code
>> https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code
>> but if you want you can do anything.
>> This is Python, we can't prevent you from doing pretty much anything with
>> it if you want. There are no "truly" private methods in Python. Also you
>> can do that:
>>
>> ```
>> while True:
>>     pass
>> ```
>>
>> and you will get your CPU at 100% too,
>>
>> J,.
>>
>> On Sun, Oct 3, 2021 at 11:22 PM Khalid Mammadov <
>> khalidmammadov9@gmail.com> wrote:
>>
>>> Hi Devs,
>>>
>>>
>>> I was reviewing DAG class and noticed that almost all it's methods are
>>> public.
>>>
>>> So, one can do something like below:
>>>
>>> with DAG(...) as dag:    t1 = BashOperator(...)
>>>
>>>         run_id = DagRun.generate_run_id(DagRunType.MANUAL, datetime.utcnow())
>>>
>>>
>>>     # This one works OK and create a DagRun for the Scheduler to pick up
>>>     dag.create_dagrun(state=DagRunState.QUEUED, run_id=run_id)
>>>
>>>     # OR EVEN DO BELOW - which caused my laptop to run on 100% CPU
>>>     # dag.run()
>>>
>>> And I was wondering if this is intentional and/or expected behavior?
>>>
>>>
>>> Thanks,
>>>
>>> Khalid
>>>
>>

Re: DAG public API en-query

Posted by Khalid Mammadov <kh...@gmail.com>.
I was thinking to follow as you mentioned the very few Pythonic notations
for private methods i.e. just add an underscore in front of it this example
to signal a user/dev this is private and an internal implementation and not
part of public API or otherwise it's.

On Sun, 3 Oct 2021, 23:50 Jarek Potiuk, <ja...@potiuk.com> wrote:

> Well. Not sure what else you'd expect, I wonder ?
>
> This is for sure unexpected and not reasonable use of it. There are best
> practices to follow and things to avoid when you run top-level python code
> https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code
> but if you want you can do anything.
> This is Python, we can't prevent you from doing pretty much anything with
> it if you want. There are no "truly" private methods in Python. Also you
> can do that:
>
> ```
> while True:
>     pass
> ```
>
> and you will get your CPU at 100% too,
>
> J,.
>
> On Sun, Oct 3, 2021 at 11:22 PM Khalid Mammadov <kh...@gmail.com>
> wrote:
>
>> Hi Devs,
>>
>>
>> I was reviewing DAG class and noticed that almost all it's methods are
>> public.
>>
>> So, one can do something like below:
>>
>> with DAG(...) as dag:    t1 = BashOperator(...)
>>
>>         run_id = DagRun.generate_run_id(DagRunType.MANUAL, datetime.utcnow())
>>
>>
>>     # This one works OK and create a DagRun for the Scheduler to pick up
>>     dag.create_dagrun(state=DagRunState.QUEUED, run_id=run_id)
>>
>>     # OR EVEN DO BELOW - which caused my laptop to run on 100% CPU
>>     # dag.run()
>>
>> And I was wondering if this is intentional and/or expected behavior?
>>
>>
>> Thanks,
>>
>> Khalid
>>
>

Re: DAG public API en-query

Posted by Jarek Potiuk <ja...@potiuk.com>.
Well. Not sure what else you'd expect, I wonder ?

This is for sure unexpected and not reasonable use of it. There are best
practices to follow and things to avoid when you run top-level python code
https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code
but if you want you can do anything.
This is Python, we can't prevent you from doing pretty much anything with
it if you want. There are no "truly" private methods in Python. Also you
can do that:

```
while True:
    pass
```

and you will get your CPU at 100% too,

J,.

On Sun, Oct 3, 2021 at 11:22 PM Khalid Mammadov <kh...@gmail.com>
wrote:

> Hi Devs,
>
>
> I was reviewing DAG class and noticed that almost all it's methods are
> public.
>
> So, one can do something like below:
>
> with DAG(...) as dag:    t1 = BashOperator(...)
>
>         run_id = DagRun.generate_run_id(DagRunType.MANUAL, datetime.utcnow())
>
>
>     # This one works OK and create a DagRun for the Scheduler to pick up
>     dag.create_dagrun(state=DagRunState.QUEUED, run_id=run_id)
>
>     # OR EVEN DO BELOW - which caused my laptop to run on 100% CPU
>     # dag.run()
>
> And I was wondering if this is intentional and/or expected behavior?
>
>
> Thanks,
>
> Khalid
>