You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Song Liu <so...@outlook.com> on 2018/05/12 11:58:43 UTC
About the DAG discovering not synced between scheduler and webserver
Hi,
When add a new dag, sometimes we can see:
```
This DAG isn't available in the web server's DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database.
```
In the views.py, it will collect DAGs under "DAGS_FOLDER" by instantiate a DagBag object as bellow:
```
dagbag = models.DagBag(settings.DAGS_FOLDER)
```
So that webserver will depends on its own timing to collect DAGs, but why not just simply to query metadata db ? since if a DAG is active in DB now it can be visible in web at the time.
Could someone share something behind this design ?
Thanks,
Song
答复: About the DAG discovering not synced between scheduler and webserver
Posted by Song Liu <so...@outlook.com>.
For example bellow two API from webserver, it is getting the dag out of one global dagbag object, which is instantiated when the app instance is created, so it (app/webserver) can't be aware of any new DAGs until this app is re-launched again ? What does this design for ?
```
dagbag = models.DagBag(settings.DAGS_FOLDER)
@expose('/run')
def run(self):
dag_id = request.args.get('dag_id')
dag = dagbag.get_dag(dag_id)
@expose('/trigger')
def trigger(self):
dag_id = request.args.get('dag_id')
dag = dagbag.get_dag(dag_id)
```
________________________________
发件人: 刘松(Brain++组) <li...@megvii.com>
发送时间: 2018年5月13日 5:39:40
收件人: dev@airflow.incubator.apache.org
主题: Re: About the DAG discovering not synced between scheduler and webserver
Hi,
It seems that Airflow handles bellow situation currently:
- DAGs discovered in scheduler, but not discovered by webserver yet
- DAGs discovered in webserver, but not discovered by scheduler yet
I still don't quite understand why there is the discovering logic separately in scheduler and webserver, based on my understanding webserver only needs to display the orm_dags from metadb, is there any requirement or design consideration besides this ?
Many thanks for any information.
Thanks,
Song
________________________________
From: Song Liu <so...@outlook.com>
Sent: Saturday, May 12, 2018 7:58:43 PM
To: dev@airflow.incubator.apache.org
Subject: About the DAG discovering not synced between scheduler and webserver
Hi,
When add a new dag, sometimes we can see:
```
This DAG isn't available in the web server's DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database.
```
In the views.py, it will collect DAGs under "DAGS_FOLDER" by instantiate a DagBag object as bellow:
```
dagbag = models.DagBag(settings.DAGS_FOLDER)
```
So that webserver will depends on its own timing to collect DAGs, but why not just simply to query metadata db ? since if a DAG is active in DB now it can be visible in web at the time.
Could someone share something behind this design ?
Thanks,
Song
Re: About the DAG discovering not synced between scheduler and
webserver
Posted by "刘松 (Brain++组)" <li...@megvii.com>.
Hi,
It seems that Airflow handles bellow situation currently:
- DAGs discovered in scheduler, but not discovered by webserver yet
- DAGs discovered in webserver, but not discovered by scheduler yet
I still don't quite understand why there is the discovering logic separately in scheduler and webserver, based on my understanding webserver only needs to display the orm_dags from metadb, is there any requirement or design consideration besides this ?
Many thanks for any information.
Thanks,
Song
________________________________
From: Song Liu <so...@outlook.com>
Sent: Saturday, May 12, 2018 7:58:43 PM
To: dev@airflow.incubator.apache.org
Subject: About the DAG discovering not synced between scheduler and webserver
Hi,
When add a new dag, sometimes we can see:
```
This DAG isn't available in the web server's DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database.
```
In the views.py, it will collect DAGs under "DAGS_FOLDER" by instantiate a DagBag object as bellow:
```
dagbag = models.DagBag(settings.DAGS_FOLDER)
```
So that webserver will depends on its own timing to collect DAGs, but why not just simply to query metadata db ? since if a DAG is active in DB now it can be visible in web at the time.
Could someone share something behind this design ?
Thanks,
Song