You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Bolke de Bruin <bd...@gmail.com> on 2017/01/10 08:25:13 UTC

Airflow 1.8.0 alpha 4

Dear All,

I have made Airflow 1.8.0 alpha 4 available at https://people.apache.org/~bolke/ <https://people.apache.org/~bolke/> . Again no Apache release yet - this is for testing purposes. I consider this Alpha to be a Beta if not for the pending features. If the pending features are merged within a reasonable time frame (except for **, as no progress currently) then I am planning to mark the tarball as Beta and only allow bug fixes and (very) minor features. This week hopefully. 

Blockers:

* None

Fixed issues
* Regression in email
* LDAP case sensitivity
* one_failed task not being run: now seems to pass suddenly (so fixed?) -> need to investigate why
* Email attachments
* Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
* Improve time units for task performance charts 
* XCom throws an duplicate / locking error
* Add execution_date to trigger_dag

Pending features:
* DAG.catchup : minor changes needed, documentation still required, integration tests seem to pass flawlessly
* Cgroups + impersonation: clean up of patches on going, more tests and more elaborate documentation required. Integration tests not executed yet
* Schedule all pending DAG runs in a single scheduler loop: no progress (**)

Cheers!
Bolke

Re: Airflow 1.8.0 alpha 4

Posted by Bolke de Bruin <bd...@gmail.com>.
Hi Laura,

What version of MySQL are you using? 5.6.4 is the absolute minimum.

- Bolke

> On 11 Jan 2017, at 22:52, Laura Lorenz <ll...@industrydive.com> wrote:
> 
> Hello! I just tried to install the release candidate alpha 4.
> 
> I was previously on airflow 1.7.0, and am not able to upgrade my metadata
> database on 1.8.0a4. I've detailed everything in a JIRA ticket at
> https://issues.apache.org/jira/browse/AIRFLOW-748. Let me know if I'm doing
> the install wrong, as I'd like to contribute to testing the new release and
> also get to use it soon :)
> 
> On Wed, Jan 11, 2017 at 4:47 PM, Dan Davydov <dan.davydov@airbnb.com.invalid
>> wrote:
> 
>> The task dependency engine code is well commented, but I can provide a high
>> level overview specifically for developers if there is interest (note that
>> this would be the first documentation of it's kind in that it would be
>> developer-only documentation). The disadvantage is that it would create
>> duplication with the logic itself on quite a large scale. Let me know
>> Bolke.
>> 
>> On Wed, Jan 11, 2017 at 1:30 PM, Chris Riccomini <cr...@apache.org>
>> wrote:
>> 
>>> @bolke, this sounds like a good list.
>>> 
>>> On Wed, Jan 11, 2017 at 12:01 PM, Bolke de Bruin <bd...@gmail.com>
>>> wrote:
>>> 
>>>> Ok.
>>>> 
>>>> For now to call it “beta” 4 items seems to be left:
>>>> 
>>>> Blocker:
>>>> * retry_delay not respected
>>>> * poison pill due to re-queue before process has finished (to be
>>>> investigated)
>>>> 
>>>> Features:
>>>> * cgroups + impersonation
>>>> * dag.catchup (Ben Tallman -> Only documentation is missing).
>>>> 
>>>> PRs that contain documentation would really be appreciated. In my
>> opinion
>>>> we are lacking there. Think about docs covering:
>>>> * new scheduler behaviour and options
>>>> * task dependency engine
>>>> * api / kerberized api
>>>> * …
>>>> 
>>>> Cheers
>>>> Bolke
>>>> 
>>>>> On 11 Jan 2017, at 18:59, Arthur Wiedmer <ar...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> +1
>>>>> 
>>>>> We can always think about different ways of doing this later (fair
>>> share
>>>>> scheduling etc...)
>>>>> 
>>>>> Best,
>>>>> Arthur
>>>>> 
>>>>> On Wed, Jan 11, 2017 at 4:46 AM, Bolke de Bruin <bd...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Dear All,
>>>>>> 
>>>>>> I would like to drop "Schedule all pending DAG runs in a single
>>>> scheduler
>>>>>> loop” from the 1.8.0 release (updated: https://github.com/apache/
>>>>>> incubator-airflow/pull/1980 <https://github.com/apache/
>>>>>> incubator-airflow/pull/1980>, original: https://github.com/apache/
>>>>>> incubator-airflow/pull/1906 <https://github.com/apache/
>>>>>> incubator-airflow/pull/1906>). The reason for this is that it, imho,
>>>>>> biases the scheduler towards a single DAG as it fills the queue with
>>>> tasks
>>>>>> from one DAG and then goes to the next DAG. Starving DAGs that come
>>>> after
>>>>>> the first for resources. As such it should be updated and that will
>>> take
>>>>>> time.
>>>>>> 
>>>>>> Please let me know if I am incorrect.
>>>>>> 
>>>>>> Thanks
>>>>>> Bolke
>>>>>> 
>>>>>>> On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>> Dear All,
>>>>>>> 
>>>>>>> I have made Airflow 1.8.0 alpha 4 available at
>>>>>> https://people.apache.org/~bolke/ <https://people.apache.org/~
>> bolke/>
>>> .
>>>>>> Again no Apache release yet - this is for testing purposes. I
>> consider
>>>> this
>>>>>> Alpha to be a Beta if not for the pending features. If the pending
>>>> features
>>>>>> are merged within a reasonable time frame (except for **, as no
>>> progress
>>>>>> currently) then I am planning to mark the tarball as Beta and only
>>> allow
>>>>>> bug fixes and (very) minor features. This week hopefully.
>>>>>>> 
>>>>>>> Blockers:
>>>>>>> 
>>>>>>> * None
>>>>>>> 
>>>>>>> Fixed issues
>>>>>>> * Regression in email
>>>>>>> * LDAP case sensitivity
>>>>>>> * one_failed task not being run: now seems to pass suddenly (so
>>> fixed?)
>>>>>> -> need to investigate why
>>>>>>> * Email attachments
>>>>>>> * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
>>>>>>> * Improve time units for task performance charts
>>>>>>> * XCom throws an duplicate / locking error
>>>>>>> * Add execution_date to trigger_dag
>>>>>>> 
>>>>>>> Pending features:
>>>>>>> * DAG.catchup : minor changes needed, documentation still required,
>>>>>> integration tests seem to pass flawlessly
>>>>>>> * Cgroups + impersonation: clean up of patches on going, more tests
>>> and
>>>>>> more elaborate documentation required. Integration tests not
>> executed
>>>> yet
>>>>>>> * Schedule all pending DAG runs in a single scheduler loop: no
>>> progress
>>>>>> (**)
>>>>>>> 
>>>>>>> Cheers!
>>>>>>> Bolke
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 


Re: Airflow 1.8.0 alpha 4

Posted by Laura Lorenz <ll...@industrydive.com>.
Hello! I just tried to install the release candidate alpha 4.

I was previously on airflow 1.7.0, and am not able to upgrade my metadata
database on 1.8.0a4. I've detailed everything in a JIRA ticket at
https://issues.apache.org/jira/browse/AIRFLOW-748. Let me know if I'm doing
the install wrong, as I'd like to contribute to testing the new release and
also get to use it soon :)

On Wed, Jan 11, 2017 at 4:47 PM, Dan Davydov <dan.davydov@airbnb.com.invalid
> wrote:

> The task dependency engine code is well commented, but I can provide a high
> level overview specifically for developers if there is interest (note that
> this would be the first documentation of it's kind in that it would be
> developer-only documentation). The disadvantage is that it would create
> duplication with the logic itself on quite a large scale. Let me know
> Bolke.
>
> On Wed, Jan 11, 2017 at 1:30 PM, Chris Riccomini <cr...@apache.org>
> wrote:
>
> > @bolke, this sounds like a good list.
> >
> > On Wed, Jan 11, 2017 at 12:01 PM, Bolke de Bruin <bd...@gmail.com>
> > wrote:
> >
> > > Ok.
> > >
> > > For now to call it “beta” 4 items seems to be left:
> > >
> > > Blocker:
> > > * retry_delay not respected
> > > * poison pill due to re-queue before process has finished (to be
> > > investigated)
> > >
> > > Features:
> > > * cgroups + impersonation
> > > * dag.catchup (Ben Tallman -> Only documentation is missing).
> > >
> > > PRs that contain documentation would really be appreciated. In my
> opinion
> > > we are lacking there. Think about docs covering:
> > > * new scheduler behaviour and options
> > > * task dependency engine
> > > * api / kerberized api
> > > * …
> > >
> > > Cheers
> > > Bolke
> > >
> > > > On 11 Jan 2017, at 18:59, Arthur Wiedmer <ar...@gmail.com>
> > > wrote:
> > > >
> > > > +1
> > > >
> > > > We can always think about different ways of doing this later (fair
> > share
> > > > scheduling etc...)
> > > >
> > > > Best,
> > > > Arthur
> > > >
> > > > On Wed, Jan 11, 2017 at 4:46 AM, Bolke de Bruin <bd...@gmail.com>
> > > wrote:
> > > >
> > > >> Dear All,
> > > >>
> > > >> I would like to drop "Schedule all pending DAG runs in a single
> > > scheduler
> > > >> loop” from the 1.8.0 release (updated: https://github.com/apache/
> > > >> incubator-airflow/pull/1980 <https://github.com/apache/
> > > >> incubator-airflow/pull/1980>, original: https://github.com/apache/
> > > >> incubator-airflow/pull/1906 <https://github.com/apache/
> > > >> incubator-airflow/pull/1906>). The reason for this is that it, imho,
> > > >> biases the scheduler towards a single DAG as it fills the queue with
> > > tasks
> > > >> from one DAG and then goes to the next DAG. Starving DAGs that come
> > > after
> > > >> the first for resources. As such it should be updated and that will
> > take
> > > >> time.
> > > >>
> > > >> Please let me know if I am incorrect.
> > > >>
> > > >> Thanks
> > > >> Bolke
> > > >>
> > > >>> On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com>
> wrote:
> > > >>>
> > > >>> Dear All,
> > > >>>
> > > >>> I have made Airflow 1.8.0 alpha 4 available at
> > > >> https://people.apache.org/~bolke/ <https://people.apache.org/~
> bolke/>
> > .
> > > >> Again no Apache release yet - this is for testing purposes. I
> consider
> > > this
> > > >> Alpha to be a Beta if not for the pending features. If the pending
> > > features
> > > >> are merged within a reasonable time frame (except for **, as no
> > progress
> > > >> currently) then I am planning to mark the tarball as Beta and only
> > allow
> > > >> bug fixes and (very) minor features. This week hopefully.
> > > >>>
> > > >>> Blockers:
> > > >>>
> > > >>> * None
> > > >>>
> > > >>> Fixed issues
> > > >>> * Regression in email
> > > >>> * LDAP case sensitivity
> > > >>> * one_failed task not being run: now seems to pass suddenly (so
> > fixed?)
> > > >> -> need to investigate why
> > > >>> * Email attachments
> > > >>> * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
> > > >>> * Improve time units for task performance charts
> > > >>> * XCom throws an duplicate / locking error
> > > >>> * Add execution_date to trigger_dag
> > > >>>
> > > >>> Pending features:
> > > >>> * DAG.catchup : minor changes needed, documentation still required,
> > > >> integration tests seem to pass flawlessly
> > > >>> * Cgroups + impersonation: clean up of patches on going, more tests
> > and
> > > >> more elaborate documentation required. Integration tests not
> executed
> > > yet
> > > >>> * Schedule all pending DAG runs in a single scheduler loop: no
> > progress
> > > >> (**)
> > > >>>
> > > >>> Cheers!
> > > >>> Bolke
> > > >>
> > > >>
> > >
> > >
> >
>

Re: Airflow 1.8.0 alpha 4

Posted by Chris Riccomini <cr...@apache.org>.
> For example someone upgrading might be bitten by the location of the
process_manager’s logs (I was!)

Same. Started getting disk near-full alerts because I had no idea it was
dumping gigs logs into /tmp.

On Thu, Jan 12, 2017 at 6:17 AM, Bolke de Bruin <bd...@gmail.com> wrote:

> Hey Dan,
>
> The engine is, but its settings are not. For example someone upgrading
> might be bitten by the location of the process_manager’s logs (I was!) -
> which are not particularly
> In a standard location by default. So an update to “UPDATING” would be
> appreciated and an update to the generic scheduler documentation as well.
>
> This obviously doesn’t need to be you to write this up, my invitation was
> addressed to the general community.
>
> Cheers
> Bolke
>
> > On 11 Jan 2017, at 22:47, Dan Davydov <da...@airbnb.com.INVALID>
> wrote:
> >
> > The task dependency engine code is well commented, but I can provide a
> high
> > level overview specifically for developers if there is interest (note
> that
> > this would be the first documentation of it's kind in that it would be
> > developer-only documentation). The disadvantage is that it would create
> > duplication with the logic itself on quite a large scale. Let me know
> Bolke.
> >
> > On Wed, Jan 11, 2017 at 1:30 PM, Chris Riccomini <cr...@apache.org>
> > wrote:
> >
> >> @bolke, this sounds like a good list.
> >>
> >> On Wed, Jan 11, 2017 at 12:01 PM, Bolke de Bruin <bd...@gmail.com>
> >> wrote:
> >>
> >>> Ok.
> >>>
> >>> For now to call it “beta” 4 items seems to be left:
> >>>
> >>> Blocker:
> >>> * retry_delay not respected
> >>> * poison pill due to re-queue before process has finished (to be
> >>> investigated)
> >>>
> >>> Features:
> >>> * cgroups + impersonation
> >>> * dag.catchup (Ben Tallman -> Only documentation is missing).
> >>>
> >>> PRs that contain documentation would really be appreciated. In my
> opinion
> >>> we are lacking there. Think about docs covering:
> >>> * new scheduler behaviour and options
> >>> * task dependency engine
> >>> * api / kerberized api
> >>> * …
> >>>
> >>> Cheers
> >>> Bolke
> >>>
> >>>> On 11 Jan 2017, at 18:59, Arthur Wiedmer <ar...@gmail.com>
> >>> wrote:
> >>>>
> >>>> +1
> >>>>
> >>>> We can always think about different ways of doing this later (fair
> >> share
> >>>> scheduling etc...)
> >>>>
> >>>> Best,
> >>>> Arthur
> >>>>
> >>>> On Wed, Jan 11, 2017 at 4:46 AM, Bolke de Bruin <bd...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Dear All,
> >>>>>
> >>>>> I would like to drop "Schedule all pending DAG runs in a single
> >>> scheduler
> >>>>> loop” from the 1.8.0 release (updated: https://github.com/apache/
> >>>>> incubator-airflow/pull/1980 <https://github.com/apache/
> >>>>> incubator-airflow/pull/1980>, original: https://github.com/apache/
> >>>>> incubator-airflow/pull/1906 <https://github.com/apache/
> >>>>> incubator-airflow/pull/1906>). The reason for this is that it, imho,
> >>>>> biases the scheduler towards a single DAG as it fills the queue with
> >>> tasks
> >>>>> from one DAG and then goes to the next DAG. Starving DAGs that come
> >>> after
> >>>>> the first for resources. As such it should be updated and that will
> >> take
> >>>>> time.
> >>>>>
> >>>>> Please let me know if I am incorrect.
> >>>>>
> >>>>> Thanks
> >>>>> Bolke
> >>>>>
> >>>>>> On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com> wrote:
> >>>>>>
> >>>>>> Dear All,
> >>>>>>
> >>>>>> I have made Airflow 1.8.0 alpha 4 available at
> >>>>> https://people.apache.org/~bolke/ <https://people.apache.org/~bolke/
> >
> >> .
> >>>>> Again no Apache release yet - this is for testing purposes. I
> consider
> >>> this
> >>>>> Alpha to be a Beta if not for the pending features. If the pending
> >>> features
> >>>>> are merged within a reasonable time frame (except for **, as no
> >> progress
> >>>>> currently) then I am planning to mark the tarball as Beta and only
> >> allow
> >>>>> bug fixes and (very) minor features. This week hopefully.
> >>>>>>
> >>>>>> Blockers:
> >>>>>>
> >>>>>> * None
> >>>>>>
> >>>>>> Fixed issues
> >>>>>> * Regression in email
> >>>>>> * LDAP case sensitivity
> >>>>>> * one_failed task not being run: now seems to pass suddenly (so
> >> fixed?)
> >>>>> -> need to investigate why
> >>>>>> * Email attachments
> >>>>>> * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
> >>>>>> * Improve time units for task performance charts
> >>>>>> * XCom throws an duplicate / locking error
> >>>>>> * Add execution_date to trigger_dag
> >>>>>>
> >>>>>> Pending features:
> >>>>>> * DAG.catchup : minor changes needed, documentation still required,
> >>>>> integration tests seem to pass flawlessly
> >>>>>> * Cgroups + impersonation: clean up of patches on going, more tests
> >> and
> >>>>> more elaborate documentation required. Integration tests not executed
> >>> yet
> >>>>>> * Schedule all pending DAG runs in a single scheduler loop: no
> >> progress
> >>>>> (**)
> >>>>>>
> >>>>>> Cheers!
> >>>>>> Bolke
> >>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Airflow 1.8.0 alpha 4

Posted by Bolke de Bruin <bd...@gmail.com>.
Hey Dan,

The engine is, but its settings are not. For example someone upgrading might be bitten by the location of the process_manager’s logs (I was!) - which are not particularly
In a standard location by default. So an update to “UPDATING” would be appreciated and an update to the generic scheduler documentation as well.

This obviously doesn’t need to be you to write this up, my invitation was addressed to the general community.

Cheers
Bolke

> On 11 Jan 2017, at 22:47, Dan Davydov <da...@airbnb.com.INVALID> wrote:
> 
> The task dependency engine code is well commented, but I can provide a high
> level overview specifically for developers if there is interest (note that
> this would be the first documentation of it's kind in that it would be
> developer-only documentation). The disadvantage is that it would create
> duplication with the logic itself on quite a large scale. Let me know Bolke.
> 
> On Wed, Jan 11, 2017 at 1:30 PM, Chris Riccomini <cr...@apache.org>
> wrote:
> 
>> @bolke, this sounds like a good list.
>> 
>> On Wed, Jan 11, 2017 at 12:01 PM, Bolke de Bruin <bd...@gmail.com>
>> wrote:
>> 
>>> Ok.
>>> 
>>> For now to call it “beta” 4 items seems to be left:
>>> 
>>> Blocker:
>>> * retry_delay not respected
>>> * poison pill due to re-queue before process has finished (to be
>>> investigated)
>>> 
>>> Features:
>>> * cgroups + impersonation
>>> * dag.catchup (Ben Tallman -> Only documentation is missing).
>>> 
>>> PRs that contain documentation would really be appreciated. In my opinion
>>> we are lacking there. Think about docs covering:
>>> * new scheduler behaviour and options
>>> * task dependency engine
>>> * api / kerberized api
>>> * …
>>> 
>>> Cheers
>>> Bolke
>>> 
>>>> On 11 Jan 2017, at 18:59, Arthur Wiedmer <ar...@gmail.com>
>>> wrote:
>>>> 
>>>> +1
>>>> 
>>>> We can always think about different ways of doing this later (fair
>> share
>>>> scheduling etc...)
>>>> 
>>>> Best,
>>>> Arthur
>>>> 
>>>> On Wed, Jan 11, 2017 at 4:46 AM, Bolke de Bruin <bd...@gmail.com>
>>> wrote:
>>>> 
>>>>> Dear All,
>>>>> 
>>>>> I would like to drop "Schedule all pending DAG runs in a single
>>> scheduler
>>>>> loop” from the 1.8.0 release (updated: https://github.com/apache/
>>>>> incubator-airflow/pull/1980 <https://github.com/apache/
>>>>> incubator-airflow/pull/1980>, original: https://github.com/apache/
>>>>> incubator-airflow/pull/1906 <https://github.com/apache/
>>>>> incubator-airflow/pull/1906>). The reason for this is that it, imho,
>>>>> biases the scheduler towards a single DAG as it fills the queue with
>>> tasks
>>>>> from one DAG and then goes to the next DAG. Starving DAGs that come
>>> after
>>>>> the first for resources. As such it should be updated and that will
>> take
>>>>> time.
>>>>> 
>>>>> Please let me know if I am incorrect.
>>>>> 
>>>>> Thanks
>>>>> Bolke
>>>>> 
>>>>>> On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com> wrote:
>>>>>> 
>>>>>> Dear All,
>>>>>> 
>>>>>> I have made Airflow 1.8.0 alpha 4 available at
>>>>> https://people.apache.org/~bolke/ <https://people.apache.org/~bolke/>
>> .
>>>>> Again no Apache release yet - this is for testing purposes. I consider
>>> this
>>>>> Alpha to be a Beta if not for the pending features. If the pending
>>> features
>>>>> are merged within a reasonable time frame (except for **, as no
>> progress
>>>>> currently) then I am planning to mark the tarball as Beta and only
>> allow
>>>>> bug fixes and (very) minor features. This week hopefully.
>>>>>> 
>>>>>> Blockers:
>>>>>> 
>>>>>> * None
>>>>>> 
>>>>>> Fixed issues
>>>>>> * Regression in email
>>>>>> * LDAP case sensitivity
>>>>>> * one_failed task not being run: now seems to pass suddenly (so
>> fixed?)
>>>>> -> need to investigate why
>>>>>> * Email attachments
>>>>>> * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
>>>>>> * Improve time units for task performance charts
>>>>>> * XCom throws an duplicate / locking error
>>>>>> * Add execution_date to trigger_dag
>>>>>> 
>>>>>> Pending features:
>>>>>> * DAG.catchup : minor changes needed, documentation still required,
>>>>> integration tests seem to pass flawlessly
>>>>>> * Cgroups + impersonation: clean up of patches on going, more tests
>> and
>>>>> more elaborate documentation required. Integration tests not executed
>>> yet
>>>>>> * Schedule all pending DAG runs in a single scheduler loop: no
>> progress
>>>>> (**)
>>>>>> 
>>>>>> Cheers!
>>>>>> Bolke
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Re: Airflow 1.8.0 alpha 4

Posted by Dan Davydov <da...@airbnb.com.INVALID>.
The task dependency engine code is well commented, but I can provide a high
level overview specifically for developers if there is interest (note that
this would be the first documentation of it's kind in that it would be
developer-only documentation). The disadvantage is that it would create
duplication with the logic itself on quite a large scale. Let me know Bolke.

On Wed, Jan 11, 2017 at 1:30 PM, Chris Riccomini <cr...@apache.org>
wrote:

> @bolke, this sounds like a good list.
>
> On Wed, Jan 11, 2017 at 12:01 PM, Bolke de Bruin <bd...@gmail.com>
> wrote:
>
> > Ok.
> >
> > For now to call it “beta” 4 items seems to be left:
> >
> > Blocker:
> > * retry_delay not respected
> > * poison pill due to re-queue before process has finished (to be
> > investigated)
> >
> > Features:
> > * cgroups + impersonation
> > * dag.catchup (Ben Tallman -> Only documentation is missing).
> >
> > PRs that contain documentation would really be appreciated. In my opinion
> > we are lacking there. Think about docs covering:
> > * new scheduler behaviour and options
> > * task dependency engine
> > * api / kerberized api
> > * …
> >
> > Cheers
> > Bolke
> >
> > > On 11 Jan 2017, at 18:59, Arthur Wiedmer <ar...@gmail.com>
> > wrote:
> > >
> > > +1
> > >
> > > We can always think about different ways of doing this later (fair
> share
> > > scheduling etc...)
> > >
> > > Best,
> > > Arthur
> > >
> > > On Wed, Jan 11, 2017 at 4:46 AM, Bolke de Bruin <bd...@gmail.com>
> > wrote:
> > >
> > >> Dear All,
> > >>
> > >> I would like to drop "Schedule all pending DAG runs in a single
> > scheduler
> > >> loop” from the 1.8.0 release (updated: https://github.com/apache/
> > >> incubator-airflow/pull/1980 <https://github.com/apache/
> > >> incubator-airflow/pull/1980>, original: https://github.com/apache/
> > >> incubator-airflow/pull/1906 <https://github.com/apache/
> > >> incubator-airflow/pull/1906>). The reason for this is that it, imho,
> > >> biases the scheduler towards a single DAG as it fills the queue with
> > tasks
> > >> from one DAG and then goes to the next DAG. Starving DAGs that come
> > after
> > >> the first for resources. As such it should be updated and that will
> take
> > >> time.
> > >>
> > >> Please let me know if I am incorrect.
> > >>
> > >> Thanks
> > >> Bolke
> > >>
> > >>> On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com> wrote:
> > >>>
> > >>> Dear All,
> > >>>
> > >>> I have made Airflow 1.8.0 alpha 4 available at
> > >> https://people.apache.org/~bolke/ <https://people.apache.org/~bolke/>
> .
> > >> Again no Apache release yet - this is for testing purposes. I consider
> > this
> > >> Alpha to be a Beta if not for the pending features. If the pending
> > features
> > >> are merged within a reasonable time frame (except for **, as no
> progress
> > >> currently) then I am planning to mark the tarball as Beta and only
> allow
> > >> bug fixes and (very) minor features. This week hopefully.
> > >>>
> > >>> Blockers:
> > >>>
> > >>> * None
> > >>>
> > >>> Fixed issues
> > >>> * Regression in email
> > >>> * LDAP case sensitivity
> > >>> * one_failed task not being run: now seems to pass suddenly (so
> fixed?)
> > >> -> need to investigate why
> > >>> * Email attachments
> > >>> * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
> > >>> * Improve time units for task performance charts
> > >>> * XCom throws an duplicate / locking error
> > >>> * Add execution_date to trigger_dag
> > >>>
> > >>> Pending features:
> > >>> * DAG.catchup : minor changes needed, documentation still required,
> > >> integration tests seem to pass flawlessly
> > >>> * Cgroups + impersonation: clean up of patches on going, more tests
> and
> > >> more elaborate documentation required. Integration tests not executed
> > yet
> > >>> * Schedule all pending DAG runs in a single scheduler loop: no
> progress
> > >> (**)
> > >>>
> > >>> Cheers!
> > >>> Bolke
> > >>
> > >>
> >
> >
>

Re: Airflow 1.8.0 alpha 4

Posted by Chris Riccomini <cr...@apache.org>.
@bolke, this sounds like a good list.

On Wed, Jan 11, 2017 at 12:01 PM, Bolke de Bruin <bd...@gmail.com> wrote:

> Ok.
>
> For now to call it “beta” 4 items seems to be left:
>
> Blocker:
> * retry_delay not respected
> * poison pill due to re-queue before process has finished (to be
> investigated)
>
> Features:
> * cgroups + impersonation
> * dag.catchup (Ben Tallman -> Only documentation is missing).
>
> PRs that contain documentation would really be appreciated. In my opinion
> we are lacking there. Think about docs covering:
> * new scheduler behaviour and options
> * task dependency engine
> * api / kerberized api
> * …
>
> Cheers
> Bolke
>
> > On 11 Jan 2017, at 18:59, Arthur Wiedmer <ar...@gmail.com>
> wrote:
> >
> > +1
> >
> > We can always think about different ways of doing this later (fair share
> > scheduling etc...)
> >
> > Best,
> > Arthur
> >
> > On Wed, Jan 11, 2017 at 4:46 AM, Bolke de Bruin <bd...@gmail.com>
> wrote:
> >
> >> Dear All,
> >>
> >> I would like to drop "Schedule all pending DAG runs in a single
> scheduler
> >> loop” from the 1.8.0 release (updated: https://github.com/apache/
> >> incubator-airflow/pull/1980 <https://github.com/apache/
> >> incubator-airflow/pull/1980>, original: https://github.com/apache/
> >> incubator-airflow/pull/1906 <https://github.com/apache/
> >> incubator-airflow/pull/1906>). The reason for this is that it, imho,
> >> biases the scheduler towards a single DAG as it fills the queue with
> tasks
> >> from one DAG and then goes to the next DAG. Starving DAGs that come
> after
> >> the first for resources. As such it should be updated and that will take
> >> time.
> >>
> >> Please let me know if I am incorrect.
> >>
> >> Thanks
> >> Bolke
> >>
> >>> On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com> wrote:
> >>>
> >>> Dear All,
> >>>
> >>> I have made Airflow 1.8.0 alpha 4 available at
> >> https://people.apache.org/~bolke/ <https://people.apache.org/~bolke/> .
> >> Again no Apache release yet - this is for testing purposes. I consider
> this
> >> Alpha to be a Beta if not for the pending features. If the pending
> features
> >> are merged within a reasonable time frame (except for **, as no progress
> >> currently) then I am planning to mark the tarball as Beta and only allow
> >> bug fixes and (very) minor features. This week hopefully.
> >>>
> >>> Blockers:
> >>>
> >>> * None
> >>>
> >>> Fixed issues
> >>> * Regression in email
> >>> * LDAP case sensitivity
> >>> * one_failed task not being run: now seems to pass suddenly (so fixed?)
> >> -> need to investigate why
> >>> * Email attachments
> >>> * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
> >>> * Improve time units for task performance charts
> >>> * XCom throws an duplicate / locking error
> >>> * Add execution_date to trigger_dag
> >>>
> >>> Pending features:
> >>> * DAG.catchup : minor changes needed, documentation still required,
> >> integration tests seem to pass flawlessly
> >>> * Cgroups + impersonation: clean up of patches on going, more tests and
> >> more elaborate documentation required. Integration tests not executed
> yet
> >>> * Schedule all pending DAG runs in a single scheduler loop: no progress
> >> (**)
> >>>
> >>> Cheers!
> >>> Bolke
> >>
> >>
>
>

Re: Airflow 1.8.0 alpha 4

Posted by Bolke de Bruin <bd...@gmail.com>.
Ok.

For now to call it “beta” 4 items seems to be left:

Blocker:
* retry_delay not respected
* poison pill due to re-queue before process has finished (to be investigated)

Features:
* cgroups + impersonation 
* dag.catchup (Ben Tallman -> Only documentation is missing).

PRs that contain documentation would really be appreciated. In my opinion we are lacking there. Think about docs covering:
* new scheduler behaviour and options
* task dependency engine
* api / kerberized api
* …

Cheers
Bolke

> On 11 Jan 2017, at 18:59, Arthur Wiedmer <ar...@gmail.com> wrote:
> 
> +1
> 
> We can always think about different ways of doing this later (fair share
> scheduling etc...)
> 
> Best,
> Arthur
> 
> On Wed, Jan 11, 2017 at 4:46 AM, Bolke de Bruin <bd...@gmail.com> wrote:
> 
>> Dear All,
>> 
>> I would like to drop "Schedule all pending DAG runs in a single scheduler
>> loop” from the 1.8.0 release (updated: https://github.com/apache/
>> incubator-airflow/pull/1980 <https://github.com/apache/
>> incubator-airflow/pull/1980>, original: https://github.com/apache/
>> incubator-airflow/pull/1906 <https://github.com/apache/
>> incubator-airflow/pull/1906>). The reason for this is that it, imho,
>> biases the scheduler towards a single DAG as it fills the queue with tasks
>> from one DAG and then goes to the next DAG. Starving DAGs that come after
>> the first for resources. As such it should be updated and that will take
>> time.
>> 
>> Please let me know if I am incorrect.
>> 
>> Thanks
>> Bolke
>> 
>>> On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com> wrote:
>>> 
>>> Dear All,
>>> 
>>> I have made Airflow 1.8.0 alpha 4 available at
>> https://people.apache.org/~bolke/ <https://people.apache.org/~bolke/> .
>> Again no Apache release yet - this is for testing purposes. I consider this
>> Alpha to be a Beta if not for the pending features. If the pending features
>> are merged within a reasonable time frame (except for **, as no progress
>> currently) then I am planning to mark the tarball as Beta and only allow
>> bug fixes and (very) minor features. This week hopefully.
>>> 
>>> Blockers:
>>> 
>>> * None
>>> 
>>> Fixed issues
>>> * Regression in email
>>> * LDAP case sensitivity
>>> * one_failed task not being run: now seems to pass suddenly (so fixed?)
>> -> need to investigate why
>>> * Email attachments
>>> * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
>>> * Improve time units for task performance charts
>>> * XCom throws an duplicate / locking error
>>> * Add execution_date to trigger_dag
>>> 
>>> Pending features:
>>> * DAG.catchup : minor changes needed, documentation still required,
>> integration tests seem to pass flawlessly
>>> * Cgroups + impersonation: clean up of patches on going, more tests and
>> more elaborate documentation required. Integration tests not executed yet
>>> * Schedule all pending DAG runs in a single scheduler loop: no progress
>> (**)
>>> 
>>> Cheers!
>>> Bolke
>> 
>> 


Re: Airflow 1.8.0 alpha 4

Posted by Arthur Wiedmer <ar...@gmail.com>.
+1

We can always think about different ways of doing this later (fair share
scheduling etc...)

Best,
Arthur

On Wed, Jan 11, 2017 at 4:46 AM, Bolke de Bruin <bd...@gmail.com> wrote:

> Dear All,
>
> I would like to drop "Schedule all pending DAG runs in a single scheduler
> loop” from the 1.8.0 release (updated: https://github.com/apache/
> incubator-airflow/pull/1980 <https://github.com/apache/
> incubator-airflow/pull/1980>, original: https://github.com/apache/
> incubator-airflow/pull/1906 <https://github.com/apache/
> incubator-airflow/pull/1906>). The reason for this is that it, imho,
> biases the scheduler towards a single DAG as it fills the queue with tasks
> from one DAG and then goes to the next DAG. Starving DAGs that come after
> the first for resources. As such it should be updated and that will take
> time.
>
> Please let me know if I am incorrect.
>
> Thanks
> Bolke
>
> > On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com> wrote:
> >
> > Dear All,
> >
> > I have made Airflow 1.8.0 alpha 4 available at
> https://people.apache.org/~bolke/ <https://people.apache.org/~bolke/> .
> Again no Apache release yet - this is for testing purposes. I consider this
> Alpha to be a Beta if not for the pending features. If the pending features
> are merged within a reasonable time frame (except for **, as no progress
> currently) then I am planning to mark the tarball as Beta and only allow
> bug fixes and (very) minor features. This week hopefully.
> >
> > Blockers:
> >
> > * None
> >
> > Fixed issues
> > * Regression in email
> > * LDAP case sensitivity
> > * one_failed task not being run: now seems to pass suddenly (so fixed?)
> -> need to investigate why
> > * Email attachments
> > * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
> > * Improve time units for task performance charts
> > * XCom throws an duplicate / locking error
> > * Add execution_date to trigger_dag
> >
> > Pending features:
> > * DAG.catchup : minor changes needed, documentation still required,
> integration tests seem to pass flawlessly
> > * Cgroups + impersonation: clean up of patches on going, more tests and
> more elaborate documentation required. Integration tests not executed yet
> > * Schedule all pending DAG runs in a single scheduler loop: no progress
> (**)
> >
> > Cheers!
> > Bolke
>
>

Re: Airflow 1.8.0 alpha 4

Posted by Chris Riccomini <cr...@apache.org>.
Hey Bolke,

I'm installing the latest alpha as we speak.

As far as I'm concerned, we've hit no remaining blockers in our dev
environment. And I'm ready for beta whenever you cut it (I'll install the
beta in our prod cluster).

Re: dropping scheduler change, I'm fine with that. I would rather favor
stability anyway, and messing with the scheduler this late in a release
doesn't seem prudent to me.

Cheers,
Chris

On Wed, Jan 11, 2017 at 6:39 AM, Alex Van Boxel <al...@vanboxel.be> wrote:

> Hey Bolke, I'll be of for a few days but I started looking at the
> CHANGELIST to make sure we now have a list of changes for 1.8 (seems like a
> big list...).
>
> I'll will also drop the trigger issue for the 1.8 and see what we can do
> for the next version. I'll make it into a documentation issue.
>
> On Wed, Jan 11, 2017 at 1:46 PM Bolke de Bruin <bd...@gmail.com> wrote:
>
> > Dear All,
> >
> > I would like to drop "Schedule all pending DAG runs in a single scheduler
> > loop” from the 1.8.0 release (updated:
> > https://github.com/apache/incubator-airflow/pull/1980 <
> > https://github.com/apache/incubator-airflow/pull/1980>, original:
> > https://github.com/apache/incubator-airflow/pull/1906 <
> > https://github.com/apache/incubator-airflow/pull/1906>). The reason for
> > this is that it, imho, biases the scheduler towards a single DAG as it
> > fills the queue with tasks from one DAG and then goes to the next DAG.
> > Starving DAGs that come after the first for resources. As such it should
> be
> > updated and that will take time.
> >
> > Please let me know if I am incorrect.
> >
> > Thanks
> > Bolke
> >
> > > On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com> wrote:
> > >
> > > Dear All,
> > >
> > > I have made Airflow 1.8.0 alpha 4 available at
> > https://people.apache.org/~bolke/ <https://people.apache.org/~bolke/> .
> > Again no Apache release yet - this is for testing purposes. I consider
> this
> > Alpha to be a Beta if not for the pending features. If the pending
> features
> > are merged within a reasonable time frame (except for **, as no progress
> > currently) then I am planning to mark the tarball as Beta and only allow
> > bug fixes and (very) minor features. This week hopefully.
> > >
> > > Blockers:
> > >
> > > * None
> > >
> > > Fixed issues
> > > * Regression in email
> > > * LDAP case sensitivity
> > > * one_failed task not being run: now seems to pass suddenly (so fixed?)
> > -> need to investigate why
> > > * Email attachments
> > > * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
> > > * Improve time units for task performance charts
> > > * XCom throws an duplicate / locking error
> > > * Add execution_date to trigger_dag
> > >
> > > Pending features:
> > > * DAG.catchup : minor changes needed, documentation still required,
> > integration tests seem to pass flawlessly
> > > * Cgroups + impersonation: clean up of patches on going, more tests and
> > more elaborate documentation required. Integration tests not executed yet
> > > * Schedule all pending DAG runs in a single scheduler loop: no progress
> > (**)
> > >
> > > Cheers!
> > > Bolke
> >
> > --
>   _/
> _/ Alex Van Boxel
>

Re: Airflow 1.8.0 alpha 4

Posted by Alex Van Boxel <al...@vanboxel.be>.
Hey Bolke, I'll be of for a few days but I started looking at the
CHANGELIST to make sure we now have a list of changes for 1.8 (seems like a
big list...).

I'll will also drop the trigger issue for the 1.8 and see what we can do
for the next version. I'll make it into a documentation issue.

On Wed, Jan 11, 2017 at 1:46 PM Bolke de Bruin <bd...@gmail.com> wrote:

> Dear All,
>
> I would like to drop "Schedule all pending DAG runs in a single scheduler
> loop” from the 1.8.0 release (updated:
> https://github.com/apache/incubator-airflow/pull/1980 <
> https://github.com/apache/incubator-airflow/pull/1980>, original:
> https://github.com/apache/incubator-airflow/pull/1906 <
> https://github.com/apache/incubator-airflow/pull/1906>). The reason for
> this is that it, imho, biases the scheduler towards a single DAG as it
> fills the queue with tasks from one DAG and then goes to the next DAG.
> Starving DAGs that come after the first for resources. As such it should be
> updated and that will take time.
>
> Please let me know if I am incorrect.
>
> Thanks
> Bolke
>
> > On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com> wrote:
> >
> > Dear All,
> >
> > I have made Airflow 1.8.0 alpha 4 available at
> https://people.apache.org/~bolke/ <https://people.apache.org/~bolke/> .
> Again no Apache release yet - this is for testing purposes. I consider this
> Alpha to be a Beta if not for the pending features. If the pending features
> are merged within a reasonable time frame (except for **, as no progress
> currently) then I am planning to mark the tarball as Beta and only allow
> bug fixes and (very) minor features. This week hopefully.
> >
> > Blockers:
> >
> > * None
> >
> > Fixed issues
> > * Regression in email
> > * LDAP case sensitivity
> > * one_failed task not being run: now seems to pass suddenly (so fixed?)
> -> need to investigate why
> > * Email attachments
> > * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
> > * Improve time units for task performance charts
> > * XCom throws an duplicate / locking error
> > * Add execution_date to trigger_dag
> >
> > Pending features:
> > * DAG.catchup : minor changes needed, documentation still required,
> integration tests seem to pass flawlessly
> > * Cgroups + impersonation: clean up of patches on going, more tests and
> more elaborate documentation required. Integration tests not executed yet
> > * Schedule all pending DAG runs in a single scheduler loop: no progress
> (**)
> >
> > Cheers!
> > Bolke
>
> --
  _/
_/ Alex Van Boxel

Re: Airflow 1.8.0 alpha 4

Posted by Bolke de Bruin <bd...@gmail.com>.
Dear All,

I would like to drop "Schedule all pending DAG runs in a single scheduler loop” from the 1.8.0 release (updated: https://github.com/apache/incubator-airflow/pull/1980 <https://github.com/apache/incubator-airflow/pull/1980>, original: https://github.com/apache/incubator-airflow/pull/1906 <https://github.com/apache/incubator-airflow/pull/1906>). The reason for this is that it, imho, biases the scheduler towards a single DAG as it fills the queue with tasks from one DAG and then goes to the next DAG. Starving DAGs that come after the first for resources. As such it should be updated and that will take time.

Please let me know if I am incorrect.

Thanks
Bolke

> On 10 Jan 2017, at 09:25, Bolke de Bruin <bd...@gmail.com> wrote:
> 
> Dear All,
> 
> I have made Airflow 1.8.0 alpha 4 available at https://people.apache.org/~bolke/ <https://people.apache.org/~bolke/> . Again no Apache release yet - this is for testing purposes. I consider this Alpha to be a Beta if not for the pending features. If the pending features are merged within a reasonable time frame (except for **, as no progress currently) then I am planning to mark the tarball as Beta and only allow bug fixes and (very) minor features. This week hopefully. 
> 
> Blockers:
> 
> * None
> 
> Fixed issues
> * Regression in email
> * LDAP case sensitivity
> * one_failed task not being run: now seems to pass suddenly (so fixed?) -> need to investigate why
> * Email attachments
> * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
> * Improve time units for task performance charts 
> * XCom throws an duplicate / locking error
> * Add execution_date to trigger_dag
> 
> Pending features:
> * DAG.catchup : minor changes needed, documentation still required, integration tests seem to pass flawlessly
> * Cgroups + impersonation: clean up of patches on going, more tests and more elaborate documentation required. Integration tests not executed yet
> * Schedule all pending DAG runs in a single scheduler loop: no progress (**)
> 
> Cheers!
> Bolke