You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Amit Jain <aj...@gmail.com> on 2018/03/23 08:16:09 UTC

Optimize duplicate run with SubDag Operator

Hi,

We have the use case where we have ETL jobs specific to single table per
dag and doing sanity checks of few of those tables according to a
particular business case in separate dag.

One such business case demands us to have tables synced with entire day
business in our DWH.

We want sanity check dag to be such that it verifies last run of ETL job
for concerned tables and make sure if last run didn't happen after 12 AM
then we need to trigger entire ETL job with retry granularity to ETL job's
operator, not as with complete subdag since our ETL jobs are costly.

We would like to avoid using sensors as they will block our celery workers
since our ETL jobs are time-consuming as well.

Please guide us to handle such use case.

--
Thanks,
Amit

Re: Optimize duplicate run with SubDag Operator

Posted by Amit Jain <aj...@gmail.com>.
Trying again.

On Fri 23 Mar, 2018, 1:46 PM Amit Jain, <aj...@gmail.com> wrote:

> Hi,
>
> We have the use case where we have ETL jobs specific to single table per
> dag and doing sanity checks of few of those tables according to a
> particular business case in separate dag.
>
> One such business case demands us to have tables synced with entire day
> business in our DWH.
>
> We want sanity check dag to be such that it verifies last run of ETL job
> for concerned tables and make sure if last run didn't happen after 12 AM
> then we need to trigger entire ETL job with retry granularity to ETL job's
> operator, not as with complete subdag since our ETL jobs are costly.
>
> We would like to avoid using sensors as they will block our celery workers
> since our ETL jobs are time-consuming as well.
>
> Please guide us to handle such use case.
>
> --
> Thanks,
> Amit
>
>
>
>