You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Song Liu <so...@outlook.com> on 2018/05/11 07:57:50 UTC

How to know the DAG is starting to run

Hi,

I have something just want to be done only once when DAG is constructed, but it seems that DAG will be instanced every time when run each of operator.

So is that there function in DAG that tell us it is starting to run now ?

Thanks,
Song

答复: 答复: How to know the DAG is starting to run

Posted by Song Liu <so...@outlook.com>.
Yes, I want to know the event about the creation of a DagRun.
________________________________
发件人: crispy16@gmail.com <cr...@gmail.com> 代表 Chris Palmer <ch...@crpalmer.com>
发送时间: 2018年5月11日 15:46
收件人: dev@airflow.incubator.apache.org
主题: Re: 答复: How to know the DAG is starting to run

It's not even clear to me what it means for a DAG to start running. The
creation of a DagRun for a specific execution date is completely
independent of the scheduling of any TaskInstances for that DagRun. There
could be a significant delay between those two events, either deliberately
encoded into the DAG or due to resource constraints.

What event are you actually interested in knowing about? The creation of a
DagRun? The starting of any task for a DagRun? Something else?

Maybe if you provided more details on what exactly the "pipeline
environment setup" you are trying to do, it would help others understand
the problem you are trying to solve.

Chris

On Fri, May 11, 2018 at 10:59 AM, Song Liu <so...@outlook.com> wrote:

> Overriding the "DAG.run" sounds like a workaround, so that if it's running
> a first operation of DAG then do some setup etc.
>
> ________________________________
> 发件人: Victor Noagbodji <vn...@amplify-nation.com>
> 发送时间: 2018年5月11日 12:50
> 收件人: dev@airflow.incubator.apache.org
> 主题: Re: How to know the DAG is starting to run
>
> Hey,
>
> I don't know if airflow has a concept of DAG-level events or callbacks.
> (Operators do have callbacks though.). You might get away with subclassing
> the DAG class or having a class decorator.
>
> The source suggests that ".run()" is the method you want to override. You
> may want to call the original "super().run()" then do what you need to do
> afterwards.
>
> Let's see if that works for you.
>
> > On May 11, 2018, at 8:26 AM, Song Liu <so...@outlook.com> wrote:
> >
> > Yes, I have though this approach, but more elegant way is doing in the
> DAG since we don't want to add this "pipeline environment setup" as a
> single operator, which should be done in the DAG more gracefully.
> > ________________________________
> > 发件人: James Meickle <jm...@quantopian.com>
> > 发送时间: 2018年5月11日 12:09
> > 收件人: dev@airflow.incubator.apache.org
> > 主题: Re: How to know the DAG is starting to run
> >
> > Song:
> >
> > You can put an operator as the very first node in the DAG, and have
> > everything else in the DAG depend on it. For example, this is the
> approach
> > we use to only execute DAG tasks on stock market trading days.
> >
> > -James M.
> >
> > On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:
> >
> >> Hi,
> >>
> >> I have something just want to be done only once when DAG is constructed,
> >> but it seems that DAG will be instanced every time when run each of
> >> operator.
> >>
> >> So is that there function in DAG that tell us it is starting to run now
> ?
> >>
> >> Thanks,
> >> Song
> >>
>
>

Re: 答复: How to know the DAG is starting to run

Posted by Chris Palmer <ch...@crpalmer.com>.
It's not even clear to me what it means for a DAG to start running. The
creation of a DagRun for a specific execution date is completely
independent of the scheduling of any TaskInstances for that DagRun. There
could be a significant delay between those two events, either deliberately
encoded into the DAG or due to resource constraints.

What event are you actually interested in knowing about? The creation of a
DagRun? The starting of any task for a DagRun? Something else?

Maybe if you provided more details on what exactly the "pipeline
environment setup" you are trying to do, it would help others understand
the problem you are trying to solve.

Chris

On Fri, May 11, 2018 at 10:59 AM, Song Liu <so...@outlook.com> wrote:

> Overriding the "DAG.run" sounds like a workaround, so that if it's running
> a first operation of DAG then do some setup etc.
>
> ________________________________
> 发件人: Victor Noagbodji <vn...@amplify-nation.com>
> 发送时间: 2018年5月11日 12:50
> 收件人: dev@airflow.incubator.apache.org
> 主题: Re: How to know the DAG is starting to run
>
> Hey,
>
> I don't know if airflow has a concept of DAG-level events or callbacks.
> (Operators do have callbacks though.). You might get away with subclassing
> the DAG class or having a class decorator.
>
> The source suggests that ".run()" is the method you want to override. You
> may want to call the original "super().run()" then do what you need to do
> afterwards.
>
> Let's see if that works for you.
>
> > On May 11, 2018, at 8:26 AM, Song Liu <so...@outlook.com> wrote:
> >
> > Yes, I have though this approach, but more elegant way is doing in the
> DAG since we don't want to add this "pipeline environment setup" as a
> single operator, which should be done in the DAG more gracefully.
> > ________________________________
> > 发件人: James Meickle <jm...@quantopian.com>
> > 发送时间: 2018年5月11日 12:09
> > 收件人: dev@airflow.incubator.apache.org
> > 主题: Re: How to know the DAG is starting to run
> >
> > Song:
> >
> > You can put an operator as the very first node in the DAG, and have
> > everything else in the DAG depend on it. For example, this is the
> approach
> > we use to only execute DAG tasks on stock market trading days.
> >
> > -James M.
> >
> > On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:
> >
> >> Hi,
> >>
> >> I have something just want to be done only once when DAG is constructed,
> >> but it seems that DAG will be instanced every time when run each of
> >> operator.
> >>
> >> So is that there function in DAG that tell us it is starting to run now
> ?
> >>
> >> Thanks,
> >> Song
> >>
>
>

答复: How to know the DAG is starting to run

Posted by Song Liu <so...@outlook.com>.
Overriding the "DAG.run" sounds like a workaround, so that if it's running a first operation of DAG then do some setup etc.

________________________________
发件人: Victor Noagbodji <vn...@amplify-nation.com>
发送时间: 2018年5月11日 12:50
收件人: dev@airflow.incubator.apache.org
主题: Re: How to know the DAG is starting to run

Hey,

I don't know if airflow has a concept of DAG-level events or callbacks. (Operators do have callbacks though.). You might get away with subclassing the DAG class or having a class decorator.

The source suggests that ".run()" is the method you want to override. You may want to call the original "super().run()" then do what you need to do afterwards.

Let's see if that works for you.

> On May 11, 2018, at 8:26 AM, Song Liu <so...@outlook.com> wrote:
>
> Yes, I have though this approach, but more elegant way is doing in the DAG since we don't want to add this "pipeline environment setup" as a single operator, which should be done in the DAG more gracefully.
> ________________________________
> 发件人: James Meickle <jm...@quantopian.com>
> 发送时间: 2018年5月11日 12:09
> 收件人: dev@airflow.incubator.apache.org
> 主题: Re: How to know the DAG is starting to run
>
> Song:
>
> You can put an operator as the very first node in the DAG, and have
> everything else in the DAG depend on it. For example, this is the approach
> we use to only execute DAG tasks on stock market trading days.
>
> -James M.
>
> On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:
>
>> Hi,
>>
>> I have something just want to be done only once when DAG is constructed,
>> but it seems that DAG will be instanced every time when run each of
>> operator.
>>
>> So is that there function in DAG that tell us it is starting to run now ?
>>
>> Thanks,
>> Song
>>


Re: How to know the DAG is starting to run

Posted by Victor Noagbodji <vn...@amplify-nation.com>.
Hey,

I don't know if airflow has a concept of DAG-level events or callbacks. (Operators do have callbacks though.). You might get away with subclassing the DAG class or having a class decorator.

The source suggests that ".run()" is the method you want to override. You may want to call the original "super().run()" then do what you need to do afterwards.

Let's see if that works for you.

> On May 11, 2018, at 8:26 AM, Song Liu <so...@outlook.com> wrote:
> 
> Yes, I have though this approach, but more elegant way is doing in the DAG since we don't want to add this "pipeline environment setup" as a single operator, which should be done in the DAG more gracefully.
> ________________________________
> 发件人: James Meickle <jm...@quantopian.com>
> 发送时间: 2018年5月11日 12:09
> 收件人: dev@airflow.incubator.apache.org
> 主题: Re: How to know the DAG is starting to run
> 
> Song:
> 
> You can put an operator as the very first node in the DAG, and have
> everything else in the DAG depend on it. For example, this is the approach
> we use to only execute DAG tasks on stock market trading days.
> 
> -James M.
> 
> On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:
> 
>> Hi,
>> 
>> I have something just want to be done only once when DAG is constructed,
>> but it seems that DAG will be instanced every time when run each of
>> operator.
>> 
>> So is that there function in DAG that tell us it is starting to run now ?
>> 
>> Thanks,
>> Song
>> 


答复: How to know the DAG is starting to run

Posted by Song Liu <so...@outlook.com>.
Yes, I have though this approach, but more elegant way is doing in the DAG since we don't want to add this "pipeline environment setup" as a single operator, which should be done in the DAG more gracefully.
________________________________
发件人: James Meickle <jm...@quantopian.com>
发送时间: 2018年5月11日 12:09
收件人: dev@airflow.incubator.apache.org
主题: Re: How to know the DAG is starting to run

Song:

You can put an operator as the very first node in the DAG, and have
everything else in the DAG depend on it. For example, this is the approach
we use to only execute DAG tasks on stock market trading days.

-James M.

On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:

> Hi,
>
> I have something just want to be done only once when DAG is constructed,
> but it seems that DAG will be instanced every time when run each of
> operator.
>
> So is that there function in DAG that tell us it is starting to run now ?
>
> Thanks,
> Song
>

Re: How to know the DAG is starting to run

Posted by Bolke de Bruin <bd...@gmail.com>.
I agree that some of the APIs are still lacking. The specific API you are referring to for a DagRUN status is being worked on as part of the Kubernetes executor (/dags/<string:dag_id>/dag_runs/<string:execution_date>). 

But if you are missing some APIs why don’t you create a PR for them? It is not as if they are very hard to create.

Regards,
Bolke.

> On 11 May 2018, at 15:30, Luke Diment <Lu...@westpac.co.nz> wrote:
> 
> I’m very sure airflows internals are awesome...but it’s programmatic integration capabilities are being found wanting...
> 
> Sent from my iPhone
> 
>> On 12/05/2018, at 1:26 AM, Luke Diment <Lu...@westpac.co.nz> wrote:
>> 
>> Airflow currently cannot be asked programmatically via an integration test if a dag has run or what status a dag is at unless someone logs onto a box and runs a manual airflow command...
>> 
>> Sent from my iPhone
>> 
>>> On 12/05/2018, at 1:19 AM, Brian Greene <br...@heisenbergwoodworking.com> wrote:
>>> 
>>> Okay I’ll bite...  WT* does that mean?
>>> 
>>> one of the best things about airflow is how easy it is to connect disparate systems... some would even say that’s much of the reason it exists..
>>> 
>>> It records reasonable metadata into an rdms (I suppose you could argue for other designs, but it’s pretty straightforward and for this workload easy enough to scale).
>>> 
>>> So... what did you mean?
>>> 
>>> Sent from a device with less than stellar autocorrect
>>> 
>>>> On May 11, 2018, at 7:19 AM, Luke Diment <Lu...@westpac.co.nz> wrote:
>>>> 
>>>> We should really instrument an interface for airflow so integration is more succinct between disparate systems!!!
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On 12/05/2018, at 12:09 AM, James Meickle <jm...@quantopian.com> wrote:
>>>>> 
>>>>> Song:
>>>>> 
>>>>> You can put an operator as the very first node in the DAG, and have
>>>>> everything else in the DAG depend on it. For example, this is the approach
>>>>> we use to only execute DAG tasks on stock market trading days.
>>>>> 
>>>>> -James M.
>>>>> 
>>>>>> On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I have something just want to be done only once when DAG is constructed,
>>>>>> but it seems that DAG will be instanced every time when run each of
>>>>>> operator.
>>>>>> 
>>>>>> So is that there function in DAG that tell us it is starting to run now ?
>>>>>> 
>>>>>> Thanks,
>>>>>> Song
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.
>>> 
>> 
>> 
>> 
>> The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.
> 
> 
> 
> The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.


Re: How to know the DAG is starting to run

Posted by Luke Diment <Lu...@westpac.co.nz>.
I’m very sure airflows internals are awesome...but it’s programmatic integration capabilities are being found wanting...

Sent from my iPhone

> On 12/05/2018, at 1:26 AM, Luke Diment <Lu...@westpac.co.nz> wrote:
>
> Airflow currently cannot be asked programmatically via an integration test if a dag has run or what status a dag is at unless someone logs onto a box and runs a manual airflow command...
>
> Sent from my iPhone
>
>> On 12/05/2018, at 1:19 AM, Brian Greene <br...@heisenbergwoodworking.com> wrote:
>>
>> Okay I’ll bite...  WT* does that mean?
>>
>> one of the best things about airflow is how easy it is to connect disparate systems... some would even say that’s much of the reason it exists..
>>
>> It records reasonable metadata into an rdms (I suppose you could argue for other designs, but it’s pretty straightforward and for this workload easy enough to scale).
>>
>> So... what did you mean?
>>
>> Sent from a device with less than stellar autocorrect
>>
>>> On May 11, 2018, at 7:19 AM, Luke Diment <Lu...@westpac.co.nz> wrote:
>>>
>>> We should really instrument an interface for airflow so integration is more succinct between disparate systems!!!
>>>
>>> Sent from my iPhone
>>>
>>>> On 12/05/2018, at 12:09 AM, James Meickle <jm...@quantopian.com> wrote:
>>>>
>>>> Song:
>>>>
>>>> You can put an operator as the very first node in the DAG, and have
>>>> everything else in the DAG depend on it. For example, this is the approach
>>>> we use to only execute DAG tasks on stock market trading days.
>>>>
>>>> -James M.
>>>>
>>>>> On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have something just want to be done only once when DAG is constructed,
>>>>> but it seems that DAG will be instanced every time when run each of
>>>>> operator.
>>>>>
>>>>> So is that there function in DAG that tell us it is starting to run now ?
>>>>>
>>>>> Thanks,
>>>>> Song
>>>>>
>>>
>>>
>>>
>>> The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.
>>
>
>
>
> The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.



The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.

Re: How to know the DAG is starting to run

Posted by Luke Diment <Lu...@westpac.co.nz>.
Airflow currently cannot be asked programmatically via an integration test if a dag has run or what status a dag is at unless someone logs onto a box and runs a manual airflow command...

Sent from my iPhone

> On 12/05/2018, at 1:19 AM, Brian Greene <br...@heisenbergwoodworking.com> wrote:
>
> Okay I’ll bite...  WT* does that mean?
>
> one of the best things about airflow is how easy it is to connect disparate systems... some would even say that’s much of the reason it exists..
>
> It records reasonable metadata into an rdms (I suppose you could argue for other designs, but it’s pretty straightforward and for this workload easy enough to scale).
>
> So... what did you mean?
>
> Sent from a device with less than stellar autocorrect
>
>> On May 11, 2018, at 7:19 AM, Luke Diment <Lu...@westpac.co.nz> wrote:
>>
>> We should really instrument an interface for airflow so integration is more succinct between disparate systems!!!
>>
>> Sent from my iPhone
>>
>>> On 12/05/2018, at 12:09 AM, James Meickle <jm...@quantopian.com> wrote:
>>>
>>> Song:
>>>
>>> You can put an operator as the very first node in the DAG, and have
>>> everything else in the DAG depend on it. For example, this is the approach
>>> we use to only execute DAG tasks on stock market trading days.
>>>
>>> -James M.
>>>
>>>> On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have something just want to be done only once when DAG is constructed,
>>>> but it seems that DAG will be instanced every time when run each of
>>>> operator.
>>>>
>>>> So is that there function in DAG that tell us it is starting to run now ?
>>>>
>>>> Thanks,
>>>> Song
>>>>
>>
>>
>>
>> The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.
>



The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.

Re: How to know the DAG is starting to run

Posted by Brian Greene <br...@heisenbergwoodworking.com>.
Okay I’ll bite...  WT* does that mean?

one of the best things about airflow is how easy it is to connect disparate systems... some would even say that’s much of the reason it exists..

It records reasonable metadata into an rdms (I suppose you could argue for other designs, but it’s pretty straightforward and for this workload easy enough to scale).  

So... what did you mean?

Sent from a device with less than stellar autocorrect

> On May 11, 2018, at 7:19 AM, Luke Diment <Lu...@westpac.co.nz> wrote:
> 
> We should really instrument an interface for airflow so integration is more succinct between disparate systems!!!
> 
> Sent from my iPhone
> 
>> On 12/05/2018, at 12:09 AM, James Meickle <jm...@quantopian.com> wrote:
>> 
>> Song:
>> 
>> You can put an operator as the very first node in the DAG, and have
>> everything else in the DAG depend on it. For example, this is the approach
>> we use to only execute DAG tasks on stock market trading days.
>> 
>> -James M.
>> 
>>> On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I have something just want to be done only once when DAG is constructed,
>>> but it seems that DAG will be instanced every time when run each of
>>> operator.
>>> 
>>> So is that there function in DAG that tell us it is starting to run now ?
>>> 
>>> Thanks,
>>> Song
>>> 
> 
> 
> 
> The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.

Re: How to know the DAG is starting to run

Posted by Luke Diment <Lu...@westpac.co.nz>.
We should really instrument an interface for airflow so integration is more succinct between disparate systems!!!

Sent from my iPhone

> On 12/05/2018, at 12:09 AM, James Meickle <jm...@quantopian.com> wrote:
>
> Song:
>
> You can put an operator as the very first node in the DAG, and have
> everything else in the DAG depend on it. For example, this is the approach
> we use to only execute DAG tasks on stock market trading days.
>
> -James M.
>
>> On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:
>>
>> Hi,
>>
>> I have something just want to be done only once when DAG is constructed,
>> but it seems that DAG will be instanced every time when run each of
>> operator.
>>
>> So is that there function in DAG that tell us it is starting to run now ?
>>
>> Thanks,
>> Song
>>



The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.

Re: How to know the DAG is starting to run

Posted by James Meickle <jm...@quantopian.com>.
Song:

You can put an operator as the very first node in the DAG, and have
everything else in the DAG depend on it. For example, this is the approach
we use to only execute DAG tasks on stock market trading days.

-James M.

On Fri, May 11, 2018 at 3:57 AM, Song Liu <so...@outlook.com> wrote:

> Hi,
>
> I have something just want to be done only once when DAG is constructed,
> but it seems that DAG will be instanced every time when run each of
> operator.
>
> So is that there function in DAG that tell us it is starting to run now ?
>
> Thanks,
> Song
>