You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Chris Redekop <ch...@replicon.com> on 2022/03/22 23:20:42 UTC

How does DAG loading isolation work?

I have a bunch of dags up and running, they're all working fine. Each
exists in its own directory and they're often split up into multiple files
for cleanliness/organization - all files for each dag reside in their
subdirectory and they don't share anything. The way they reference their
files is like so:
    import sys, os
    sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)))
    import my_other_file_in_this_dir

It works nice...even if multiple dags have files with the same names (like,
they each have a "config.py") airflow keeps them all isolated, and they
behave just how you would want and expect them to...Awesome! Now I've
decided I'm going to add some simple dag validation tests as part of my
CICD...so I write up a test much like this:
    def test_no_import_errors():
        dag_bag = DagBag(dag_folder=path_to_my_dags, include_examples=False)
        assert len(dag_bag.import_errors) == 0, "No Import Failures"

...and what the heck I'm getting import errors like "AttributeError: module
'config' has no attribute 'value2'" because all the config.py files are
conflicting - instead of each dag getting the config.py in its own dir,
they're all getting the first config.py that happened to get loaded. So I
take a look through the code, and I can't figure out why it actually works
in airflow - reading the code I would expect the modules to conflict in
airflow just like they do in the test. Now I'm worrying that all my dags
are doing something that is completely unsupported and they're only working
in airflow by some weird fluke. Can anyone offer any insight into how they
actually work in airflow and/or how I could get my tests to work in the
same way? Is this actually a supported/expected/normal thing to do?  I've
uploaded a simple repro of the issue here
https://github.com/repl-chris/airflow-dag-isolation for clarity and/or
playing.  Thanks!

- Chris

Re: How does DAG loading isolation work?

Posted by Stanislav Vohnik <sv...@yahoo.com>.
Jo musím natočit novej prodat tion v tomhletom nemá ty své změny určitě

Sent from my iPhone

> On 23. 3. 2022, at 14:42, Chris Redekop <ch...@replicon.com> wrote:
> 
> 
> ok, thanks all...I will rework them
> 
>> On Wed, Mar 23, 2022 at 4:36 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>> Daniel is completely right. You should not use relative imports.
>> Airflow does import files in a different way than regular import, for
>> multiple reasons. "Don't use relative imports" is even very explicitly
>> stated (with examples) in the best practices we have for module
>> management (for precisely this reason):
>> 
>> https://airflow.apache.org/docs/apache-airflow/stable/modules_management.html?highlight=module%20management#best-practices-for-module-loading
>> 
>> BTW: I recommend you follow all the advice from there.
>> 
>> J.
>> 
>> 
>> On Wed, Mar 23, 2022 at 5:07 AM Daniel Standish
>> <da...@astronomer.io> wrote:
>> >
>> > I am not sure why it works in airflow (perhaps dags are parsed in distinct processes???).  But, thinking of how you might fix this.... it seems you are doing relative imports.  Maybe if you stop doing that, that would be enough.  So instead of importing `from config import value` you could do `from dags.my_dir1.my_dir2.config import value`.  Then you wouldn't need to muck with the python path either.
>> >

Re: How does DAG loading isolation work?

Posted by Chris Redekop <ch...@replicon.com>.
ok, thanks all...I will rework them

On Wed, Mar 23, 2022 at 4:36 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Daniel is completely right. You should not use relative imports.
> Airflow does import files in a different way than regular import, for
> multiple reasons. "Don't use relative imports" is even very explicitly
> stated (with examples) in the best practices we have for module
> management (for precisely this reason):
>
>
> https://airflow.apache.org/docs/apache-airflow/stable/modules_management.html?highlight=module%20management#best-practices-for-module-loading
>
> BTW: I recommend you follow all the advice from there.
>
> J.
>
>
> On Wed, Mar 23, 2022 at 5:07 AM Daniel Standish
> <da...@astronomer.io> wrote:
> >
> > I am not sure why it works in airflow (perhaps dags are parsed in
> distinct processes???).  But, thinking of how you might fix this.... it
> seems you are doing relative imports.  Maybe if you stop doing that, that
> would be enough.  So instead of importing `from config import value` you
> could do `from dags.my_dir1.my_dir2.config import value`.  Then you
> wouldn't need to muck with the python path either.
> >
>

Re: How does DAG loading isolation work?

Posted by Jarek Potiuk <ja...@potiuk.com>.
Daniel is completely right. You should not use relative imports.
Airflow does import files in a different way than regular import, for
multiple reasons. "Don't use relative imports" is even very explicitly
stated (with examples) in the best practices we have for module
management (for precisely this reason):

https://airflow.apache.org/docs/apache-airflow/stable/modules_management.html?highlight=module%20management#best-practices-for-module-loading

BTW: I recommend you follow all the advice from there.

J.


On Wed, Mar 23, 2022 at 5:07 AM Daniel Standish
<da...@astronomer.io> wrote:
>
> I am not sure why it works in airflow (perhaps dags are parsed in distinct processes???).  But, thinking of how you might fix this.... it seems you are doing relative imports.  Maybe if you stop doing that, that would be enough.  So instead of importing `from config import value` you could do `from dags.my_dir1.my_dir2.config import value`.  Then you wouldn't need to muck with the python path either.
>

Re: How does DAG loading isolation work?

Posted by Daniel Standish <da...@astronomer.io>.
I am not sure why it works in airflow (perhaps dags are parsed in distinct
processes???).  But, thinking of how you might fix this.... it seems you
are doing relative imports.  Maybe if you stop doing that, that would be
enough.  So instead of importing `from config import value` you could do
`from dags.my_dir1.my_dir2.config import value`.  Then you wouldn't need to
muck with the python path either.