You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Germain TANGUY <ge...@dailymotion.com> on 2017/07/18 06:56:27 UTC

Deploy procedure for new/modify dags

Hello everybody,

I would like to know what are your procedure to deploy new versions of your DAGs, especially for dags that have external dependencies (bash script..etc)
I use CeleryExecutor with multiples workers and so there is an issue of consistency between workers, schedulers and webserver.

Today I pause the dags, I wait until all running tasks complete, I restart all airflow services and unpause the dags. Is there a better way?

Best regards,

Germain T.


Re: Deploy procedure for new/modify dags

Posted by Germain TANGUY <ge...@dailymotion.com>.
Hello Arthur,

Thanks for your help,

In your case I will have to update worker code, not necessarily webserver/scheduler and I will set the option --ship_dag to False.

This deployment method imply that I have to pause all my dags, wait my queue is empty and restart my worker to pull and install the new code and dependencies. I have some external dependencies which take time to pip install so my service won’t be available during this time. Am I correct in assuming this? 

I discovered that we can specify the queue where the scheduler push the tasks and the worker listen to. Can it be a viable solution to create a queue for each commit, to deploy a new set of workers for each commit and to kill the old one when they don’t have anything anymore in their old queue?

Germain T.




On 19/07/17 07:53, "Arthur Wiedmer" <ar...@gmail.com> wrote:

    Hi Germain,
    
    As long as the structure of the DAG is not changed (tasks are the same and
    the dependency graph does not change), there should be no need to restart
    anything.
    
    The scheduler only needs the structure of the DAG to send the right message
    to celery. Essentially the message tells the worker to run an airflow run
    command for this dag_id, this task_id and the execution_date.
    While the webserver for instance might show you an older version of the
    bash script, the code executed will be the latest available on the worker.
    You should be able to check this by checking the logs for the task, since
    the script is usually logged there.
    
    I hope this helps,
    
    Sincerely,
    Arthur
    
    
    On Mon, Jul 17, 2017 at 11:56 PM, Germain TANGUY <
    germain.tanguy@dailymotion.com> wrote:
    
    > Hello everybody,
    >
    > I would like to know what are your procedure to deploy new versions of
    > your DAGs, especially for dags that have external dependencies (bash
    > script..etc)
    > I use CeleryExecutor with multiples workers and so there is an issue of
    > consistency between workers, schedulers and webserver.
    >
    > Today I pause the dags, I wait until all running tasks complete, I restart
    > all airflow services and unpause the dags. Is there a better way?
    >
    > Best regards,
    >
    > Germain T.
    >
    >
    


Re: Deploy procedure for new/modify dags

Posted by Arthur Wiedmer <ar...@gmail.com>.
Hi Germain,

As long as the structure of the DAG is not changed (tasks are the same and
the dependency graph does not change), there should be no need to restart
anything.

The scheduler only needs the structure of the DAG to send the right message
to celery. Essentially the message tells the worker to run an airflow run
command for this dag_id, this task_id and the execution_date.
While the webserver for instance might show you an older version of the
bash script, the code executed will be the latest available on the worker.
You should be able to check this by checking the logs for the task, since
the script is usually logged there.

I hope this helps,

Sincerely,
Arthur


On Mon, Jul 17, 2017 at 11:56 PM, Germain TANGUY <
germain.tanguy@dailymotion.com> wrote:

> Hello everybody,
>
> I would like to know what are your procedure to deploy new versions of
> your DAGs, especially for dags that have external dependencies (bash
> script..etc)
> I use CeleryExecutor with multiples workers and so there is an issue of
> consistency between workers, schedulers and webserver.
>
> Today I pause the dags, I wait until all running tasks complete, I restart
> all airflow services and unpause the dags. Is there a better way?
>
> Best regards,
>
> Germain T.
>
>