You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by 落雨留音 <lu...@gmail.com> on 2021/05/06 08:22:28 UTC
DISCUSS: How to create and discovery airflow dag?
1. Why does airflow dag not support reading directly from db, but reading
from a local file
The current way of discovering dag is scan local files and then synchronize
to db. If I want to create a dag, I need to create a dag file in the
scheduler dags_folder, and then synchronize the dag file to the web and
worker. Why can't I store dag file to the db directly? then web, scheduler,
and worker all obtain the dag file through the db?
2. Why is there no createDag api
Why is there no api to create dag? as long as I call the api, dag
information can be synchronized to db and the local files of web, scheduler
and worker?
Re: DISCUSS: How to create and discovery airflow dag?
Posted by Ash Berlin-Taylor <as...@apache.org>.
The answer to both questions is "because the DAGs are python files" and
"because that's how it is now/we haven't written it yet".
Historically Airflow needed the actual python code in the DAGs to do
anything (show them in the UI, schedule them or execute them), but with
Airflow 2.0 and DAG serialization becoming mandatory the UI no longer
needs the files, and the "main" scheduler doesn't either, but the DAG
parsing process still requires DAGs on disk, and the actual task
execution will always need DAG files.
The main reason execution needs DAG files is to support Python
operators (calling python functions defined in your DAG) or custom
operators, which could also be defined in disk.
We could extend Airflow to support "submitting" DAGs via an API with
the condition that no python operator, and no custom operators are
used. Or python operator could work so long as there is no closure or
advanced scope etc. But then we have to start to worry about all the
edge cases and the security of the API becomes _much_ more important.
In short, because it's complicated and has some nasty edge cases.
We'll likely get there eventually.
-ash
On Thu, May 6 2021 at 16:22:28 +0800, 落雨留音
<lu...@gmail.com> wrote:
> 1. Why does airflow dag not support reading directly from db, but
> reading from a local file
> The current way of discovering dag is scan local files and then
> synchronize to db. If I want to create a dag, I need to create a dag
> file in the scheduler dags_folder, and then synchronize the dag file
> to the web and worker. Why can't I store dag file to the db directly?
> then web, scheduler, and worker all obtain the dag file through the
> db?
>
> 2. Why is there no createDag api
> Why is there no api to create dag? as long as I call the api, dag
> information can be synchronized to db and the local files of web,
> scheduler and worker?