You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by 落雨留音 <lu...@gmail.com> on 2021/05/06 08:22:28 UTC

DISCUSS: How to create and discovery airflow dag?

1. Why does airflow dag not support reading directly from db, but reading
from a local file
The current way of discovering dag is scan local files and then synchronize
to db. If I want to create a dag, I need to create a dag file in the
scheduler dags_folder, and then synchronize the dag file to the web and
worker. Why can't I store dag file to the db directly? then web, scheduler,
and worker all obtain the dag file through the db?

2. Why is there no createDag api
Why is there no api to create dag? as long as I call the api, dag
information can be synchronized to db and the local files of web, scheduler
and worker?

Re: DISCUSS: How to create and discovery airflow dag?

Posted by Ash Berlin-Taylor <as...@apache.org>.
The answer to both questions is "because the DAGs are python files" and 
"because that's how it is now/we haven't written it yet".

Historically Airflow needed the actual python code in the DAGs to do 
anything (show them in the UI, schedule them or execute them), but with 
Airflow 2.0 and DAG serialization becoming mandatory the UI no longer 
needs the files, and the "main" scheduler doesn't either, but the DAG 
parsing process still requires DAGs on disk, and the actual task 
execution will always need DAG files.

The main reason execution needs DAG files is to support Python 
operators (calling python functions defined in your DAG) or custom 
operators, which could also be defined in disk.

We could extend Airflow to support "submitting" DAGs via an API with 
the condition that no python operator, and no custom operators are 
used. Or python operator could work so long as there is no closure or 
advanced scope etc. But then we have to start to worry about all the 
edge cases and the security of the API becomes _much_ more important.

In short, because it's complicated and has some nasty edge cases.

We'll likely get there eventually.

-ash



On Thu, May 6 2021 at 16:22:28 +0800, 落雨留音 
<lu...@gmail.com> wrote:
> 1. Why does airflow dag not support reading directly from db, but 
> reading from a local file
> The current way of discovering dag is scan local files and then 
> synchronize to db. If I want to create a dag, I need to create a dag 
> file in the scheduler dags_folder, and then synchronize the dag file 
> to the web and worker. Why can't I store dag file to the db directly? 
> then web, scheduler, and worker all obtain the dag file through the 
> db?
> 
> 2. Why is there no createDag api
> Why is there no api to create dag? as long as I call the api, dag 
> information can be synchronized to db and the local files of web, 
> scheduler and worker?