You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Dennis O'Brien <de...@dennisobrien.net> on 2018/04/02 18:26:43 UTC

schedule backfill jobs in reverse order

Hi folks,

I recently asked this question on gitter but didn't get any feedback.

Anyone know if there is a way to get the scheduler to reverse the order of
the dag runs? By default a new DAG starts at start_date then moves
sequentially forward in time until it is caught up (assuming catchup=True).
The same is true for a new DAG just enabled, or a DAG that is cleared, and
for a backfill.

The behavior I'd like to get is for the scheduler to queue up the latest
available, so it starts most recent, then moves back in time. If while the
backfill is running a more recent DAG run is eligible, that one should be
queued next.

Is there anyway to accomplish this?  Is this a feature that others would
find useful?

For some background, I have some jobs that make predictions and do a long
backfill for historical backtesting, and that can mean no new predictions
for a week depending on the job and the time to backfill.  Ideally the most
recent jobs would take precedence over the historical jobs.

thanks,
Dennis

Re: schedule backfill jobs in reverse order

Posted by Dennis O'Brien <de...@dennisobrien.net>.
Thanks for the feedback Max, and the pointer to the line of code.  I'll
take a look and make a PR to discuss further.

cheers,
Dennis


On Wed, Apr 4, 2018 at 10:49 AM Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> It's a totally reasonable use case. As you said, currently it only moves
> forward, though it could also move backwards. You could add a DAG argument
> `schedule_past_dagruns=False` (we probably need a better name), that would
> enable this feature.
>
> Here's the code that creates DagRuns, it seems like it may be easy to add a
> few lines to implement this feature.
>
> https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L761
>
> Max
>
> On Mon, Apr 2, 2018 at 7:17 PM, David Capwell <dc...@gmail.com> wrote:
>
> > Nothing I know of.  The scheduler finds the latest execution then creates
> > the next based off interval; this is also why update to start date have
> no
> > affect (doesn't try to fill gaps)
> >
> > On Mon, Apr 2, 2018, 11:26 AM Dennis O'Brien <de...@dennisobrien.net>
> > wrote:
> >
> > > Hi folks,
> > >
> > > I recently asked this question on gitter but didn't get any feedback.
> > >
> > > Anyone know if there is a way to get the scheduler to reverse the order
> > of
> > > the dag runs? By default a new DAG starts at start_date then moves
> > > sequentially forward in time until it is caught up (assuming
> > catchup=True).
> > > The same is true for a new DAG just enabled, or a DAG that is cleared,
> > and
> > > for a backfill.
> > >
> > > The behavior I'd like to get is for the scheduler to queue up the
> latest
> > > available, so it starts most recent, then moves back in time. If while
> > the
> > > backfill is running a more recent DAG run is eligible, that one should
> be
> > > queued next.
> > >
> > > Is there anyway to accomplish this?  Is this a feature that others
> would
> > > find useful?
> > >
> > > For some background, I have some jobs that make predictions and do a
> long
> > > backfill for historical backtesting, and that can mean no new
> predictions
> > > for a week depending on the job and the time to backfill.  Ideally the
> > most
> > > recent jobs would take precedence over the historical jobs.
> > >
> > > thanks,
> > > Dennis
> > >
> >
>

Re: schedule backfill jobs in reverse order

Posted by Maxime Beauchemin <ma...@gmail.com>.
It's a totally reasonable use case. As you said, currently it only moves
forward, though it could also move backwards. You could add a DAG argument
`schedule_past_dagruns=False` (we probably need a better name), that would
enable this feature.

Here's the code that creates DagRuns, it seems like it may be easy to add a
few lines to implement this feature.
https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L761

Max

On Mon, Apr 2, 2018 at 7:17 PM, David Capwell <dc...@gmail.com> wrote:

> Nothing I know of.  The scheduler finds the latest execution then creates
> the next based off interval; this is also why update to start date have no
> affect (doesn't try to fill gaps)
>
> On Mon, Apr 2, 2018, 11:26 AM Dennis O'Brien <de...@dennisobrien.net>
> wrote:
>
> > Hi folks,
> >
> > I recently asked this question on gitter but didn't get any feedback.
> >
> > Anyone know if there is a way to get the scheduler to reverse the order
> of
> > the dag runs? By default a new DAG starts at start_date then moves
> > sequentially forward in time until it is caught up (assuming
> catchup=True).
> > The same is true for a new DAG just enabled, or a DAG that is cleared,
> and
> > for a backfill.
> >
> > The behavior I'd like to get is for the scheduler to queue up the latest
> > available, so it starts most recent, then moves back in time. If while
> the
> > backfill is running a more recent DAG run is eligible, that one should be
> > queued next.
> >
> > Is there anyway to accomplish this?  Is this a feature that others would
> > find useful?
> >
> > For some background, I have some jobs that make predictions and do a long
> > backfill for historical backtesting, and that can mean no new predictions
> > for a week depending on the job and the time to backfill.  Ideally the
> most
> > recent jobs would take precedence over the historical jobs.
> >
> > thanks,
> > Dennis
> >
>

Re: schedule backfill jobs in reverse order

Posted by David Capwell <dc...@gmail.com>.
Nothing I know of.  The scheduler finds the latest execution then creates
the next based off interval; this is also why update to start date have no
affect (doesn't try to fill gaps)

On Mon, Apr 2, 2018, 11:26 AM Dennis O'Brien <de...@dennisobrien.net>
wrote:

> Hi folks,
>
> I recently asked this question on gitter but didn't get any feedback.
>
> Anyone know if there is a way to get the scheduler to reverse the order of
> the dag runs? By default a new DAG starts at start_date then moves
> sequentially forward in time until it is caught up (assuming catchup=True).
> The same is true for a new DAG just enabled, or a DAG that is cleared, and
> for a backfill.
>
> The behavior I'd like to get is for the scheduler to queue up the latest
> available, so it starts most recent, then moves back in time. If while the
> backfill is running a more recent DAG run is eligible, that one should be
> queued next.
>
> Is there anyway to accomplish this?  Is this a feature that others would
> find useful?
>
> For some background, I have some jobs that make predictions and do a long
> backfill for historical backtesting, and that can mean no new predictions
> for a week depending on the job and the time to backfill.  Ideally the most
> recent jobs would take precedence over the historical jobs.
>
> thanks,
> Dennis
>