You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airflow.apache.org by Wilson Lian <ww...@google.com.INVALID> on 2017/02/14 21:40:59 UTC

Soliciting feedback: Using the Airflow CLI as a thin client

Hi all,

I'm interested in using the Airflow CLI as a thin client so that I can run
DAG-management commands like pause, unpause, trigger_dag, run, etc. from a
local machine against a remote airflow cluster (e.g., running in Google
Container Engine).

I have tried pointing [core]sql_alchemy_conn at the remote database, but
without a shared view of the DAGs folder, the different components don't
seem to be able to sync up. For example, list_dags looks at the local DAGs
folder, but not at the database; and using trigger_dag with a local DAG
file seems to put the DAG in the database, but its task instances never
execute, presumably because none of the nodes in the cluster have a copy of
the DAG file.

I think in order for the CLI to be used as a thin client, the database,
rather than the DAGs folder needs to be used as the source of truth for
DAGs (and possibly other objects). Can anyone provide an estimate of how
heavyweight such a change would be?

I'm also curious what people think about delegating the pointer to the
current config file to a higher-level config file that contains references
to different configurations and a pointer to the "current" config.

Re: Soliciting feedback: Using the Airflow CLI as a thin client

Posted by Bolke de Bruin <bd...@gmail.com>.

Hi,

It is indeed the ambition to have te CLI work as a thin client where that makes sense, but it will take a couple of releases before we are fully there. So the CLI definitely in 1.8.0 doesn’t have API endpoints for all functionality yet. With this release we are shipping the foundation with one endpoint available: trigger_dag. 

It would definitely be appreciated to have patches that add to the API (list_dags is a good starting point) or improve the foundation (swagger definitions for example). Also on the architecture side there is work to do (Should we integrate the security functionality of the API and the Web UI? Should we separate the API server from the Web UI server? etc). Ping me a if you would like to work on that of if you would like to share your thoughts (on the mailing list is fine of course).

- Bolke

> On 15 Feb 2017, at 21:12, siddharth anand <sa...@apache.org> wrote:
> 
> Hi Wilson,
> I'm a huge fan of the CLI and you are correct that the released current
> version of the CLI requires both a connection to the DB and access to the
> dag folder.
> 
> In the new 1.8.0 release that is currently being driven by Bolke, the CLI
> uses the API. I'm not 100% sure that all CLIs commands have API end-points,
> but I suspect it's nearly complete if not already complete. That reminds
> me.. as we vet the 1.8.0 release candidates, we should test out both CLI
> and API.
> 
> In a nutshell, the goal is for the CLI to be a thin-wrapper that talks to
> the API (running on the webserver), which would have access to both the DB
> and DAG folder. This would allow anyone to run CLI from any machine that
> has access to the API endpoints.
> -s
> 
> On Tue, Feb 14, 2017 at 1:40 PM, Wilson Lian <ww...@google.com.invalid>
> wrote:
> 
>> Hi all,
>> 
>> I'm interested in using the Airflow CLI as a thin client so that I can run
>> DAG-management commands like pause, unpause, trigger_dag, run, etc. from a
>> local machine against a remote airflow cluster (e.g., running in Google
>> Container Engine).
>> 
>> I have tried pointing [core]sql_alchemy_conn at the remote database, but
>> without a shared view of the DAGs folder, the different components don't
>> seem to be able to sync up. For example, list_dags looks at the local DAGs
>> folder, but not at the database; and using trigger_dag with a local DAG
>> file seems to put the DAG in the database, but its task instances never
>> execute, presumably because none of the nodes in the cluster have a copy of
>> the DAG file.
>> 
>> I think in order for the CLI to be used as a thin client, the database,
>> rather than the DAGs folder needs to be used as the source of truth for
>> DAGs (and possibly other objects). Can anyone provide an estimate of how
>> heavyweight such a change would be?
>> 
>> I'm also curious what people think about delegating the pointer to the
>> current config file to a higher-level config file that contains references
>> to different configurations and a pointer to the "current" config.
>>

Re: Soliciting feedback: Using the Airflow CLI as a thin client

Posted by siddharth anand <sa...@apache.org>.

Hi Wilson,
I'm a huge fan of the CLI and you are correct that the released current
version of the CLI requires both a connection to the DB and access to the
dag folder.

In the new 1.8.0 release that is currently being driven by Bolke, the CLI
uses the API. I'm not 100% sure that all CLIs commands have API end-points,
but I suspect it's nearly complete if not already complete. That reminds
me.. as we vet the 1.8.0 release candidates, we should test out both CLI
and API.

In a nutshell, the goal is for the CLI to be a thin-wrapper that talks to
the API (running on the webserver), which would have access to both the DB
and DAG folder. This would allow anyone to run CLI from any machine that
has access to the API endpoints.
-s

On Tue, Feb 14, 2017 at 1:40 PM, Wilson Lian <ww...@google.com.invalid>
wrote:

> Hi all,
>
> I'm interested in using the Airflow CLI as a thin client so that I can run
> DAG-management commands like pause, unpause, trigger_dag, run, etc. from a
> local machine against a remote airflow cluster (e.g., running in Google
> Container Engine).
>
> I have tried pointing [core]sql_alchemy_conn at the remote database, but
> without a shared view of the DAGs folder, the different components don't
> seem to be able to sync up. For example, list_dags looks at the local DAGs
> folder, but not at the database; and using trigger_dag with a local DAG
> file seems to put the DAG in the database, but its task instances never
> execute, presumably because none of the nodes in the cluster have a copy of
> the DAG file.
>
> I think in order for the CLI to be used as a thin client, the database,
> rather than the DAGs folder needs to be used as the source of truth for
> DAGs (and possibly other objects). Can anyone provide an estimate of how
> heavyweight such a change would be?
>
> I'm also curious what people think about delegating the pointer to the
> current config file to a higher-level config file that contains references
> to different configurations and a pointer to the "current" config.
>