You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Chris Riccomini <cr...@apache.org> on 2016/05/17 17:14:36 UTC

Adhoc operators

Hey all,

Curious about what the 'adhoc' property is in BaseOperator. It appears to
be completely undocumented. What is this?

Cheers,
Chris

Re: Adhoc operators

Posted by Jeremiah Lowin <jl...@apache.org>.
I think it is a useful feature that nonetheless adds disproportionate
complexity -- for example, is there logic for when there is a task
downstream from an ad-hoc task, and the ad-hoc task isn't being run?

Perhaps there is a way to reengineer it around current Airflow idioms.
Maybe we can start by figuring out what exactly it's being used for? Here
are a few use-cases that come to mind (under the heading of "actions that
relate to my DAG but I don't want them to run every time... just at the
very beginning or occasionally on demand"):
- initializing a database table (and making sure it exists before running
downstream tasks)
- periodic maintenance, for example pruning, truncating tables, etc.
- initial logins, connection testing, etc.
- issuing some sort of debug command to a third party system before running
the rest of the DAG

On Wed, May 18, 2016 at 11:23 AM Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> The idea there is to be able to ship on-demand tasks along with your DAG.
> Is it not used because it's not documented?
>
> Deprecating may be harder than maintaining it. We'd have to start warning
> about deprecation in 2.0 soon and add the PR that removes this in the
> [eventual] 2.0 branch.
>
> I don't feel strongly about the feature, I added it because we had use
> cases for it, and we didn't have externally triggered DAGs at the time. I
> get how it can become confusing, both from a usability and code maintenance
> perspective.
>
> Max
>
> On Tue, May 17, 2016 at 12:49 PM, Chris Riccomini <cr...@apache.org>
> wrote:
>
> > Yea, it feels like a pretty edge-use case. It's not even documented. In
> the
> > interest of simplifying and and reducing bugs it seems like we might just
> > want to nuke this, or completely rethink the use cases.
> >
> > On Tue, May 17, 2016 at 12:22 PM, Jeremiah Lowin <jl...@apache.org>
> > wrote:
> >
> > > Perhaps ad-hoc tasks could be refractored as ad-hoc DAGs? It sounds
> like
> > > they are for infrequent initialization or maintainence tasks.
> > >
> > > On Tue, May 17, 2016 at 11:21 AM Arthur Wiedmer <ar...@apache.org>
> > wrote:
> > >
> > > > We still have tasks in production that use this feature.
> > > >
> > > > Sometimes, it has been used for one off tasks that create simple
> static
> > > > mapping tables (Tables loaded from a static file that also lives in
> > > source
> > > > control, creating a programmatically generated time dimension
> etc...).
> > > >
> > > > Of course, maybe just having the task in question as a script that
> uses
> > > the
> > > > airflow utilities would be sufficient.
> > > >
> > > > Best,
> > > > Arthur
> > > >
> > > > On Tue, May 17, 2016 at 10:40 AM, Chris Riccomini <
> > criccomini@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > @Bolke/@Jeremiah
> > > > >
> > > > > When you make your changes to unify the backfiller and scheduler,
> it
> > > > sounds
> > > > > like this can go away, right?
> > > > >
> > > > > On Tue, May 17, 2016 at 10:38 AM, Maxime Beauchemin <
> > > > > maximebeauchemin@gmail.com> wrote:
> > > > >
> > > > > > The scheduler won't trigger where `adhoc=True`. The CLI's
> > > > > backfill/test/run
> > > > > > is the only way to trigger where `adhoc=True`. For backfill
> > > > specifically,
> > > > > > there's a `-a`, `--include_adhoc` flag to make these tasks
> in-scope
> > > to
> > > > > the
> > > > > > backfill.
> > > > > >
> > > > > > Max
> > > > > >
> > > > > > On Tue, May 17, 2016 at 10:14 AM, Chris Riccomini <
> > > > criccomini@apache.org
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey all,
> > > > > > >
> > > > > > > Curious about what the 'adhoc' property is in BaseOperator. It
> > > > appears
> > > > > to
> > > > > > > be completely undocumented. What is this?
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Chris
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Adhoc operators

Posted by Maxime Beauchemin <ma...@gmail.com>.
The idea there is to be able to ship on-demand tasks along with your DAG.
Is it not used because it's not documented?

Deprecating may be harder than maintaining it. We'd have to start warning
about deprecation in 2.0 soon and add the PR that removes this in the
[eventual] 2.0 branch.

I don't feel strongly about the feature, I added it because we had use
cases for it, and we didn't have externally triggered DAGs at the time. I
get how it can become confusing, both from a usability and code maintenance
perspective.

Max

On Tue, May 17, 2016 at 12:49 PM, Chris Riccomini <cr...@apache.org>
wrote:

> Yea, it feels like a pretty edge-use case. It's not even documented. In the
> interest of simplifying and and reducing bugs it seems like we might just
> want to nuke this, or completely rethink the use cases.
>
> On Tue, May 17, 2016 at 12:22 PM, Jeremiah Lowin <jl...@apache.org>
> wrote:
>
> > Perhaps ad-hoc tasks could be refractored as ad-hoc DAGs? It sounds like
> > they are for infrequent initialization or maintainence tasks.
> >
> > On Tue, May 17, 2016 at 11:21 AM Arthur Wiedmer <ar...@apache.org>
> wrote:
> >
> > > We still have tasks in production that use this feature.
> > >
> > > Sometimes, it has been used for one off tasks that create simple static
> > > mapping tables (Tables loaded from a static file that also lives in
> > source
> > > control, creating a programmatically generated time dimension etc...).
> > >
> > > Of course, maybe just having the task in question as a script that uses
> > the
> > > airflow utilities would be sufficient.
> > >
> > > Best,
> > > Arthur
> > >
> > > On Tue, May 17, 2016 at 10:40 AM, Chris Riccomini <
> criccomini@apache.org
> > >
> > > wrote:
> > >
> > > > @Bolke/@Jeremiah
> > > >
> > > > When you make your changes to unify the backfiller and scheduler, it
> > > sounds
> > > > like this can go away, right?
> > > >
> > > > On Tue, May 17, 2016 at 10:38 AM, Maxime Beauchemin <
> > > > maximebeauchemin@gmail.com> wrote:
> > > >
> > > > > The scheduler won't trigger where `adhoc=True`. The CLI's
> > > > backfill/test/run
> > > > > is the only way to trigger where `adhoc=True`. For backfill
> > > specifically,
> > > > > there's a `-a`, `--include_adhoc` flag to make these tasks in-scope
> > to
> > > > the
> > > > > backfill.
> > > > >
> > > > > Max
> > > > >
> > > > > On Tue, May 17, 2016 at 10:14 AM, Chris Riccomini <
> > > criccomini@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey all,
> > > > > >
> > > > > > Curious about what the 'adhoc' property is in BaseOperator. It
> > > appears
> > > > to
> > > > > > be completely undocumented. What is this?
> > > > > >
> > > > > > Cheers,
> > > > > > Chris
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Adhoc operators

Posted by Chris Riccomini <cr...@apache.org>.
Yea, it feels like a pretty edge-use case. It's not even documented. In the
interest of simplifying and and reducing bugs it seems like we might just
want to nuke this, or completely rethink the use cases.

On Tue, May 17, 2016 at 12:22 PM, Jeremiah Lowin <jl...@apache.org> wrote:

> Perhaps ad-hoc tasks could be refractored as ad-hoc DAGs? It sounds like
> they are for infrequent initialization or maintainence tasks.
>
> On Tue, May 17, 2016 at 11:21 AM Arthur Wiedmer <ar...@apache.org> wrote:
>
> > We still have tasks in production that use this feature.
> >
> > Sometimes, it has been used for one off tasks that create simple static
> > mapping tables (Tables loaded from a static file that also lives in
> source
> > control, creating a programmatically generated time dimension etc...).
> >
> > Of course, maybe just having the task in question as a script that uses
> the
> > airflow utilities would be sufficient.
> >
> > Best,
> > Arthur
> >
> > On Tue, May 17, 2016 at 10:40 AM, Chris Riccomini <criccomini@apache.org
> >
> > wrote:
> >
> > > @Bolke/@Jeremiah
> > >
> > > When you make your changes to unify the backfiller and scheduler, it
> > sounds
> > > like this can go away, right?
> > >
> > > On Tue, May 17, 2016 at 10:38 AM, Maxime Beauchemin <
> > > maximebeauchemin@gmail.com> wrote:
> > >
> > > > The scheduler won't trigger where `adhoc=True`. The CLI's
> > > backfill/test/run
> > > > is the only way to trigger where `adhoc=True`. For backfill
> > specifically,
> > > > there's a `-a`, `--include_adhoc` flag to make these tasks in-scope
> to
> > > the
> > > > backfill.
> > > >
> > > > Max
> > > >
> > > > On Tue, May 17, 2016 at 10:14 AM, Chris Riccomini <
> > criccomini@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Hey all,
> > > > >
> > > > > Curious about what the 'adhoc' property is in BaseOperator. It
> > appears
> > > to
> > > > > be completely undocumented. What is this?
> > > > >
> > > > > Cheers,
> > > > > Chris
> > > > >
> > > >
> > >
> >
>

Re: Adhoc operators

Posted by Jeremiah Lowin <jl...@apache.org>.
Perhaps ad-hoc tasks could be refractored as ad-hoc DAGs? It sounds like
they are for infrequent initialization or maintainence tasks.

On Tue, May 17, 2016 at 11:21 AM Arthur Wiedmer <ar...@apache.org> wrote:

> We still have tasks in production that use this feature.
>
> Sometimes, it has been used for one off tasks that create simple static
> mapping tables (Tables loaded from a static file that also lives in source
> control, creating a programmatically generated time dimension etc...).
>
> Of course, maybe just having the task in question as a script that uses the
> airflow utilities would be sufficient.
>
> Best,
> Arthur
>
> On Tue, May 17, 2016 at 10:40 AM, Chris Riccomini <cr...@apache.org>
> wrote:
>
> > @Bolke/@Jeremiah
> >
> > When you make your changes to unify the backfiller and scheduler, it
> sounds
> > like this can go away, right?
> >
> > On Tue, May 17, 2016 at 10:38 AM, Maxime Beauchemin <
> > maximebeauchemin@gmail.com> wrote:
> >
> > > The scheduler won't trigger where `adhoc=True`. The CLI's
> > backfill/test/run
> > > is the only way to trigger where `adhoc=True`. For backfill
> specifically,
> > > there's a `-a`, `--include_adhoc` flag to make these tasks in-scope to
> > the
> > > backfill.
> > >
> > > Max
> > >
> > > On Tue, May 17, 2016 at 10:14 AM, Chris Riccomini <
> criccomini@apache.org
> > >
> > > wrote:
> > >
> > > > Hey all,
> > > >
> > > > Curious about what the 'adhoc' property is in BaseOperator. It
> appears
> > to
> > > > be completely undocumented. What is this?
> > > >
> > > > Cheers,
> > > > Chris
> > > >
> > >
> >
>

Re: Adhoc operators

Posted by Arthur Wiedmer <ar...@apache.org>.
We still have tasks in production that use this feature.

Sometimes, it has been used for one off tasks that create simple static
mapping tables (Tables loaded from a static file that also lives in source
control, creating a programmatically generated time dimension etc...).

Of course, maybe just having the task in question as a script that uses the
airflow utilities would be sufficient.

Best,
Arthur

On Tue, May 17, 2016 at 10:40 AM, Chris Riccomini <cr...@apache.org>
wrote:

> @Bolke/@Jeremiah
>
> When you make your changes to unify the backfiller and scheduler, it sounds
> like this can go away, right?
>
> On Tue, May 17, 2016 at 10:38 AM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
>
> > The scheduler won't trigger where `adhoc=True`. The CLI's
> backfill/test/run
> > is the only way to trigger where `adhoc=True`. For backfill specifically,
> > there's a `-a`, `--include_adhoc` flag to make these tasks in-scope to
> the
> > backfill.
> >
> > Max
> >
> > On Tue, May 17, 2016 at 10:14 AM, Chris Riccomini <criccomini@apache.org
> >
> > wrote:
> >
> > > Hey all,
> > >
> > > Curious about what the 'adhoc' property is in BaseOperator. It appears
> to
> > > be completely undocumented. What is this?
> > >
> > > Cheers,
> > > Chris
> > >
> >
>

Re: Adhoc operators

Posted by Bolke de Bruin <bd...@gmail.com>.
I'm all for it to remove it as I have difficulty imagining a use case and increases complexity in several places, but maybe max has one and is still using it?

Sent from my iPhone

> On 17 mei 2016, at 19:40, Chris Riccomini <cr...@apache.org> wrote:
> 
> @Bolke/@Jeremiah
> 
> When you make your changes to unify the backfiller and scheduler, it sounds
> like this can go away, right?
> 
> On Tue, May 17, 2016 at 10:38 AM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
> 
>> The scheduler won't trigger where `adhoc=True`. The CLI's backfill/test/run
>> is the only way to trigger where `adhoc=True`. For backfill specifically,
>> there's a `-a`, `--include_adhoc` flag to make these tasks in-scope to the
>> backfill.
>> 
>> Max
>> 
>> On Tue, May 17, 2016 at 10:14 AM, Chris Riccomini <cr...@apache.org>
>> wrote:
>> 
>>> Hey all,
>>> 
>>> Curious about what the 'adhoc' property is in BaseOperator. It appears to
>>> be completely undocumented. What is this?
>>> 
>>> Cheers,
>>> Chris
>> 

Re: Adhoc operators

Posted by Chris Riccomini <cr...@apache.org>.
@Bolke/@Jeremiah

When you make your changes to unify the backfiller and scheduler, it sounds
like this can go away, right?

On Tue, May 17, 2016 at 10:38 AM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> The scheduler won't trigger where `adhoc=True`. The CLI's backfill/test/run
> is the only way to trigger where `adhoc=True`. For backfill specifically,
> there's a `-a`, `--include_adhoc` flag to make these tasks in-scope to the
> backfill.
>
> Max
>
> On Tue, May 17, 2016 at 10:14 AM, Chris Riccomini <cr...@apache.org>
> wrote:
>
> > Hey all,
> >
> > Curious about what the 'adhoc' property is in BaseOperator. It appears to
> > be completely undocumented. What is this?
> >
> > Cheers,
> > Chris
> >
>

Re: Adhoc operators

Posted by Maxime Beauchemin <ma...@gmail.com>.
The scheduler won't trigger where `adhoc=True`. The CLI's backfill/test/run
is the only way to trigger where `adhoc=True`. For backfill specifically,
there's a `-a`, `--include_adhoc` flag to make these tasks in-scope to the
backfill.

Max

On Tue, May 17, 2016 at 10:14 AM, Chris Riccomini <cr...@apache.org>
wrote:

> Hey all,
>
> Curious about what the 'adhoc' property is in BaseOperator. It appears to
> be completely undocumented. What is this?
>
> Cheers,
> Chris
>