You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Shubham Gupta <sh...@gmail.com> on 2018/08/07 00:03:20 UTC

Re: Catchup By default = False vs LatestOnlyOperator

Thanks a lot for the useful info.

Regards
Shubham Gupta

On Wed, Jul 25, 2018 at 7:48 PM Sid Anand <sa...@apache.org> wrote:

> I will +1 James comment and add to it. At Agari, one of our DAGs had as a
> final step the sending of an alert. The alerts only made sense when the DAG
> was current. But, sometimes, we did need to recompute some metrics based on
> historical data, but not alert on them. The LatestOnlyOperator was a good
> fit for this case.
>
> George/Ben,
> It would be great to document this discussion -- i.e. when to use one over
> another.
>
> -s
>
>
> On Mon, Jul 23, 2018 at 2:03 PM George Leslie-Waksman <wa...@gmail.com>
> wrote:
>
> > Ok, not so fringe; I'm glad it's working well for your use case, James.
> >
> > I retract my suggestion of deprecation.
> >
> > On Mon, Jul 23, 2018 at 12:58 PM James Meickle
> > <jm...@quantopian.com.invalid> wrote:
> >
> > > We use LatestOnlyOperator in production. Generally our data is
> available
> > on
> > > a regular schedule, and we update production services with it as soon
> as
> > it
> > > is available; we might occasionally want to re-run historical days, in
> > > which case we want to run the same DAG but without interacting with
> live
> > > production services at all.
> > >
> > > On Mon, Jul 23, 2018 at 2:18 PM, George Leslie-Waksman <
> > waksman@gmail.com>
> > > wrote:
> > >
> > > > As the author of LatestOnlyOperator, the goal was as a stopgap until
> > > > catchup=False landed.
> > > >
> > > > There are some (very) fringe use cases where you might still want
> > > > LatestOnlyOperator but in almost all cases what you want is probably
> > > > catchup=False.
> > > >
> > > > The situations where LatestOnlyOperator is still useful are where you
> > > want
> > > > to run most of your DAG for every schedule interval but you want some
> > of
> > > > the tasks to run only on the latest run (not catching up, not
> > > backfilling).
> > > >
> > > > It may be best to deprecate LatestOnlyOperator at this point to avoid
> > > > confusion.
> > > >
> > > > --George
> > > >
> > > > On Sat, Jul 21, 2018 at 7:34 PM Ben Tallman <bt...@gmail.com>
> > wrote:
> > > >
> > > > > As the author of catch-up, the idea is that in many cases your data
> > > > > doesn't "window" nicely and you want instead to just run as if it
> > were
> > > a
> > > > > brilliant Cron...
> > > > >
> > > > > Ben
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > > On Jul 20, 2018, at 11:39 PM, Shah Altaf <me...@gmail.com>
> > wrote:
> > > > > >
> > > > > > Hi my understanding is: if you use the LatestOnlyOperator then
> when
> > > you
> > > > > run
> > > > > > the DAG for the first time you'll see a whole bunch of DAG runs
> > > queued
> > > > > up,
> > > > > > and in each run the LatestOnlyOperator will cause the rest of the
> > DAG
> > > > run
> > > > > > to be skipped.  Only the latest DAG will run in 'full'.
> > > > > >
> > > > > > With catchup = False, you should just get just the latest DAG
> run.
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 20, 2018 at 10:58 PM Shubham Gupta <
> > > > > shubham180695.sg@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> ---------- Forwarded message ---------
> > > > > >> From: Shubham Gupta <sh...@gmail.com>
> > > > > >> Date: Fri, Jul 20, 2018 at 2:38 PM
> > > > > >> Subject: Catchup By default = False vs LatestOnlyOperator
> > > > > >> To: <de...@airflow.incubator.apache.org>
> > > > > >>
> > > > > >>
> > > > > >> Hi!
> > > > > >>
> > > > > >> Can someone please explain the difference b/w catchup by
> default =
> > > > False
> > > > > >> and LatestOnlyOperator?
> > > > > >>
> > > > > >> Regarding
> > > > > >> Shubham Gupta
> > > > > >>
> > > > >
> > > >
> > >
> >
>