You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/16 19:28:39 UTC

[GitHub] [airflow] peidaqi opened a new issue #12387: Calling models.DagBag().dags within a DAG causes maximum recursion error in Python

peidaqi opened a new issue #12387:
URL: https://github.com/apache/airflow/issues/12387


   **Apache Airflow version**: 1.10.12
   
   **Environment**: Mac Os Catalina Python 3.8.6
   
   **What happened**:
   
   I want to get a list of existing DAGs and check if a DAG already existed before creation. 
   With in the DAG file, what I did was calling:
   models.DagBag().dags
   
   This causes a maximum recursion error in Python. 
   The reason why I need this is a certain dynamic DAG_X can be created by two other dags, either DAG_1 or DAG_2.
   
   
   **What you expected to happen**:
   Get a list of existing DAGs
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb closed issue #12387: Calling models.DagBag().dags within a DAG causes maximum recursion error in Python

Posted by GitBox <gi...@apache.org>.
ashb closed issue #12387:
URL: https://github.com/apache/airflow/issues/12387


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #12387: Calling models.DagBag().dags within a DAG causes maximum recursion error in Python

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #12387:
URL: https://github.com/apache/airflow/issues/12387#issuecomment-731852616


   >  I simply just want to get a list of currently registered DAGs.
   
   Why? And why do you want to do this _whenever your dag is parsed_? (Hint: This is about once a second.!) It is considered good pratice to avoid any unnecessary code at the top level of your DAG file.
   
   If you _absolutely_ need this at the top level (i.e. outside of a task's execute function) then use the DagModel DB table. DAG Bag is not designed for this purpose --- as you've found out it is for parsing dags from files on disk. You have by very definition created a recursion, as you are creating a new instance of a DagBag, which parses the dag file. which then creates a new DagBag etc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #12387: Calling models.DagBag().dags within a DAG causes maximum recursion error in Python

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #12387:
URL: https://github.com/apache/airflow/issues/12387#issuecomment-728305030


   You'll have to explain your use case more -- I can't see from what you've said why you'd even need to get this other dag object.
   
   Also even if we fixed the error so it didn't give a recursion error, it might not work how you expect: when parsing DAGs it only parses one file at a time, not all of them.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] peidaqi commented on issue #12387: Calling models.DagBag().dags within a DAG causes maximum recursion error in Python

Posted by GitBox <gi...@apache.org>.
peidaqi commented on issue #12387:
URL: https://github.com/apache/airflow/issues/12387#issuecomment-731850486


   @maroshmka that looks like a strange solution though. I simply just want to get a list of currently registered DAGs. Since it's a read only operation it shouldn't invovle something complicated as a semaphore variable.
   
   The maximum recursion error definitely has something to do with how models.DagBag().dags is implemented and if call to this will trigger a re-read of all the DAGs, then this error is expected - but still a bit strange. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #12387: Calling models.DagBag().dags within a DAG causes maximum recursion error in Python

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #12387:
URL: https://github.com/apache/airflow/issues/12387#issuecomment-731393299


   Better yet: don't execute expensive code at the top level of a DAG file -- it has to be run to parse the dag, which slows things down.
   
   
   In 2.0 it slows things down a lot less than it used to, but it would still slow down every task that executes in this dag unnecessarily.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #12387: Calling models.DagBag().dags within a DAG causes maximum recursion error in Python

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #12387:
URL: https://github.com/apache/airflow/issues/12387#issuecomment-728275132


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] maroshmka commented on issue #12387: Calling models.DagBag().dags within a DAG causes maximum recursion error in Python

Posted by GitBox <gi...@apache.org>.
maroshmka commented on issue #12387:
URL: https://github.com/apache/airflow/issues/12387#issuecomment-731392455


   I'm thinking if this is not more of a client issue. You're trying to read files while one of those files is the one starting the reading. To me it sounds that maximum recursion is the correct error over here, you're doing infinite recursion and you should be aware of that, also you should control it in your code.
   
   I would achieve this by using some semaphore var to know if skip or not the file:
   
   ```python
   import os
   loading = int(os.environ.get("AIRFLOW_DAGBAG_LOAD_GLOBAL_ACTIVE", "0"))
   
   if loading == 0:
       os.environ["AIRFLOW_DAGBAG_LOAD_GLOBAL_ACTIVE"] = "1"
       dagbag = models.DagBag().dags
       print(dagbag)
   else:
       print("skipping this file")
   ```
   
   The var can be probably also inserted in `globals()`, haven't tested that tho. The posted code should work.
   
   Just be careful that it's in global scope (example uses env vars). Which should be carefully used. In case you'd use it in multiple places you would need to have more complex management if something should or shouldn't be read.
   
   Hope it helps a bit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org