You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <Ja...@polidea.com> on 2019/07/23 06:32:23 UTC

Travis CI random failures

Hello everyone,

We've started to experience some random failures on Travis relaated to lack
of resources: those are either Out of Memory errors or lack of CPUS to run
Kubernetes builds.

I tried to rerun those, thinking it was an intermittent error. It started
happening yesterday and I have not seen it before so I rather doubt it is
related to the latest changes.

But I do not want to risk everyone being blocked so I am testing now on my
own fork if reverting the latest CI changes help. I will let you know and
will revert in case I found old CI works in a stable way.

In the meantime - I will cancel all outstanding builds  that are blocking
our queue and will test it both old CI and new CI in our fork :( (Travis
queue limit is not helping).

Can you please hold on with rebasing/pushing new PRs until I check it.

Example failures:


   - OSError: [Errno 12] Cannot allocate memory (
   https://travis-ci.org/apache/airflow/jobs/562395978)
   - [ERROR NumCPU]: the number of available CPUs 1 is less than the
   required 2 (https://travis-ci.org/apache/airflow/jobs/562395978)


J.

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Travis CI random failures

Posted by Jarek Potiuk <Ja...@polidea.com>.
Yep. We are back. We still have a slow queue, but at least the builds are
not failing randomly. Please rebase your builds on top of the current
master and push again to trigger builds.

On Tue, Jul 23, 2019 at 9:31 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Looks like they fixed it: https://github.com/travis-ci/worker/issues/604
>
> On Tue, Jul 23, 2019 at 8:38 PM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
>> I see issues at different Apache projects as well, Druid and Avro. They're
>> running out of memory. Let's see how Travis responds.
>>
>> Cheers, Fokko
>>
>> Op di 23 jul. 2019 om 19:43 schreef Jarek Potiuk <
>> Jarek.Potiuk@polidea.com>:
>>
>> > FYI. Still not fixed. Others experience this as well:
>> > https://github.com/travis-ci/worker/issues/604
>> >
>> > On Tue, Jul 23, 2019 at 11:34 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
>> >
>> > wrote:
>> >
>> > > No good news yet. We are getting randomly assigned 1CPU /3.5GB mem
>> > > instances still. Infrastructure is on it.
>> > >
>> > > On Tue, Jul 23, 2019 at 10:49 AM Jarek Potiuk <
>> Jarek.Potiuk@polidea.com>
>> > > wrote:
>> > >
>> > >> It looks like we are back to the original specs. I am runnning tests
>> and
>> > >> re-enable everything if I see it works.
>> > >>
>> > >> J.
>> > >>
>> > >> On Tue, Jul 23, 2019 at 10:34 AM Jarek Potiuk <
>> Jarek.Potiuk@polidea.com
>> > >
>> > >> wrote:
>> > >>
>> > >>> From INFRA: "I have confirmed that our builds appear to be running
>> with
>> > >>> 3.75GB memory and 1 core currently. This does not match Travis'
>> > standard
>> > >>> specs (7.5GB and 2 cores), and I have raised a ticket with their
>> > support. I
>> > >>> will respond when we hear back from Travis."
>> > >>>
>> > >>>
>> > >>> On Tue, Jul 23, 2019 at 10:26 AM Jarek Potiuk <
>> > Jarek.Potiuk@polidea.com>
>> > >>> wrote:
>> > >>>
>> > >>>> It's definitely confirmed that the problem is on Travis CI side:
>> > >>>>
>> > >>>> I re-run the commit before the new CI was introduced (I
>> cherry-picked
>> > a
>> > >>>> small doc fix related to recent sphinx dependency update) and it
>> > fails in
>> > >>>> exactly the same way (memory and cpu problems):
>> > >>>> https://travis-ci.org/apache/airflow/builds/562450592.
>> > >>>>
>> > >>>> For now I cannot do much but wait for the INFRA's response (and
>> work
>> > on
>> > >>>> GitLab CI replacement of Travis).
>> > >>>>
>> > >>>> I recommend to bring some pop-corn. It's going to be an interesting
>> > one
>> > >>>> to watch.
>> > >>>>
>> > >>>> J.
>> > >>>>
>> > >>>> On Tue, Jul 23, 2019 at 9:43 AM Jarek Potiuk <
>> > Jarek.Potiuk@polidea.com>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> It's now pretty consistent and happens pretty much every time
>> using
>> > >>>>> the old build system - for example here:
>> > >>>>> https://travis-ci.org/apache/airflow/builds/562435992.
>> > >>>>>
>> > >>>>> I will cancel all PRs and disable automated PR build on Travis
>> until
>> > >>>>> we solve the problem - as it is pointless - new PRs will simply
>> > queue and
>> > >>>>> fail constantly.
>> > >>>>>
>> > >>>>> I opened critical infrastructure ticket:
>> > >>>>> https://issues.apache.org/jira/browse/INFRA-18787 and I am
>> running
>> > >>>>> some additional tests - I run the builds from commit before the
>> new
>> > CI so
>> > >>>>> that I see if another change since then could cause it.
>> > >>>>>
>> > >>>>> J.
>> > >>>>>
>> > >>>>>
>> > >>>>> On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <
>> > Jarek.Potiuk@polidea.com>
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>>> Update2: I can confirm that the same memory/resource related
>> issues
>> > >>>>>> happen in my Travis CI forks with reverted changes :(
>> > >>>>>> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will
>> > >>>>>> escalate it to Travis/APACHE infrastructure
>> > >>>>>>
>> > >>>>>> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <
>> > >>>>>> Jarek.Potiuk@polidea.com> wrote:
>> > >>>>>>
>> > >>>>>>> Update: it looks like it's Travis's problem: I reverted the CI
>> > >>>>>>> changes and we have the same CPU problem in the old build:
>> > >>>>>>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
>> > >>>>>>>
>> > >>>>>>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <
>> > >>>>>>> Jarek.Potiuk@polidea.com> wrote:
>> > >>>>>>>
>> > >>>>>>>> Hello everyone,
>> > >>>>>>>>
>> > >>>>>>>> We've started to experience some random failures on Travis
>> > relaated
>> > >>>>>>>> to lack of resources: those are either Out of Memory errors or
>> > lack of CPUS
>> > >>>>>>>> to run Kubernetes builds.
>> > >>>>>>>>
>> > >>>>>>>> I tried to rerun those, thinking it was an intermittent error.
>> It
>> > >>>>>>>> started happening yesterday and I have not seen it before so I
>> > rather doubt
>> > >>>>>>>> it is related to the latest changes.
>> > >>>>>>>>
>> > >>>>>>>> But I do not want to risk everyone being blocked so I am
>> testing
>> > >>>>>>>> now on my own fork if reverting the latest CI changes help. I
>> > will let you
>> > >>>>>>>> know and will revert in case I found old CI works in a stable
>> way.
>> > >>>>>>>>
>> > >>>>>>>> In the meantime - I will cancel all outstanding builds  that
>> are
>> > >>>>>>>> blocking our queue and will test it both old CI and new CI in
>> our
>> > fork :(
>> > >>>>>>>> (Travis queue limit is not helping).
>> > >>>>>>>>
>> > >>>>>>>> Can you please hold on with rebasing/pushing new PRs until I
>> check
>> > >>>>>>>> it.
>> > >>>>>>>>
>> > >>>>>>>> Example failures:
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>    - OSError: [Errno 12] Cannot allocate memory (
>> > >>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>> > >>>>>>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less
>> than
>> > >>>>>>>>    the required 2 (
>> > >>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> J.
>> > >>>>>>>>
>> > >>>>>>>> --
>> > >>>>>>>>
>> > >>>>>>>> Jarek Potiuk
>> > >>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>> > >>>>>>>>
>> > >>>>>>>> M: +48 660 796 129 <+48660796129>
>> > >>>>>>>> [image: Polidea] <https://www.polidea.com/>
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>
>> > >>>>>>> --
>> > >>>>>>>
>> > >>>>>>> Jarek Potiuk
>> > >>>>>>> Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>> > >>>>>>>
>> > >>>>>>> M: +48 660 796 129 <+48660796129>
>> > >>>>>>> [image: Polidea] <https://www.polidea.com/>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>
>> > >>>>>> --
>> > >>>>>>
>> > >>>>>> Jarek Potiuk
>> > >>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >>>>>>
>> > >>>>>> M: +48 660 796 129 <+48660796129>
>> > >>>>>> [image: Polidea] <https://www.polidea.com/>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>
>> > >>>>> --
>> > >>>>>
>> > >>>>> Jarek Potiuk
>> > >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >>>>>
>> > >>>>> M: +48 660 796 129 <+48660796129>
>> > >>>>> [image: Polidea] <https://www.polidea.com/>
>> > >>>>>
>> > >>>>>
>> > >>>>
>> > >>>> --
>> > >>>>
>> > >>>> Jarek Potiuk
>> > >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >>>>
>> > >>>> M: +48 660 796 129 <+48660796129>
>> > >>>> [image: Polidea] <https://www.polidea.com/>
>> > >>>>
>> > >>>>
>> > >>>
>> > >>> --
>> > >>>
>> > >>> Jarek Potiuk
>> > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >>>
>> > >>> M: +48 660 796 129 <+48660796129>
>> > >>> [image: Polidea] <https://www.polidea.com/>
>> > >>>
>> > >>>
>> > >>
>> > >> --
>> > >>
>> > >> Jarek Potiuk
>> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >>
>> > >> M: +48 660 796 129 <+48660796129>
>> > >> [image: Polidea] <https://www.polidea.com/>
>> > >>
>> > >>
>> > >
>> > > --
>> > >
>> > > Jarek Potiuk
>> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >
>> > > M: +48 660 796 129 <+48660796129>
>> > > [image: Polidea] <https://www.polidea.com/>
>> > >
>> > >
>> >
>> > --
>> >
>> > Jarek Potiuk
>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >
>> > M: +48 660 796 129 <+48660796129>
>> > [image: Polidea] <https://www.polidea.com/>
>> >
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Travis CI random failures

Posted by Jarek Potiuk <Ja...@polidea.com>.
Looks like they fixed it: https://github.com/travis-ci/worker/issues/604

On Tue, Jul 23, 2019 at 8:38 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> I see issues at different Apache projects as well, Druid and Avro. They're
> running out of memory. Let's see how Travis responds.
>
> Cheers, Fokko
>
> Op di 23 jul. 2019 om 19:43 schreef Jarek Potiuk <Jarek.Potiuk@polidea.com
> >:
>
> > FYI. Still not fixed. Others experience this as well:
> > https://github.com/travis-ci/worker/issues/604
> >
> > On Tue, Jul 23, 2019 at 11:34 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > No good news yet. We are getting randomly assigned 1CPU /3.5GB mem
> > > instances still. Infrastructure is on it.
> > >
> > > On Tue, Jul 23, 2019 at 10:49 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > wrote:
> > >
> > >> It looks like we are back to the original specs. I am runnning tests
> and
> > >> re-enable everything if I see it works.
> > >>
> > >> J.
> > >>
> > >> On Tue, Jul 23, 2019 at 10:34 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com
> > >
> > >> wrote:
> > >>
> > >>> From INFRA: "I have confirmed that our builds appear to be running
> with
> > >>> 3.75GB memory and 1 core currently. This does not match Travis'
> > standard
> > >>> specs (7.5GB and 2 cores), and I have raised a ticket with their
> > support. I
> > >>> will respond when we hear back from Travis."
> > >>>
> > >>>
> > >>> On Tue, Jul 23, 2019 at 10:26 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > >>> wrote:
> > >>>
> > >>>> It's definitely confirmed that the problem is on Travis CI side:
> > >>>>
> > >>>> I re-run the commit before the new CI was introduced (I
> cherry-picked
> > a
> > >>>> small doc fix related to recent sphinx dependency update) and it
> > fails in
> > >>>> exactly the same way (memory and cpu problems):
> > >>>> https://travis-ci.org/apache/airflow/builds/562450592.
> > >>>>
> > >>>> For now I cannot do much but wait for the INFRA's response (and work
> > on
> > >>>> GitLab CI replacement of Travis).
> > >>>>
> > >>>> I recommend to bring some pop-corn. It's going to be an interesting
> > one
> > >>>> to watch.
> > >>>>
> > >>>> J.
> > >>>>
> > >>>> On Tue, Jul 23, 2019 at 9:43 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > >>>> wrote:
> > >>>>
> > >>>>> It's now pretty consistent and happens pretty much every time using
> > >>>>> the old build system - for example here:
> > >>>>> https://travis-ci.org/apache/airflow/builds/562435992.
> > >>>>>
> > >>>>> I will cancel all PRs and disable automated PR build on Travis
> until
> > >>>>> we solve the problem - as it is pointless - new PRs will simply
> > queue and
> > >>>>> fail constantly.
> > >>>>>
> > >>>>> I opened critical infrastructure ticket:
> > >>>>> https://issues.apache.org/jira/browse/INFRA-18787 and I am running
> > >>>>> some additional tests - I run the builds from commit before the new
> > CI so
> > >>>>> that I see if another change since then could cause it.
> > >>>>>
> > >>>>> J.
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Update2: I can confirm that the same memory/resource related
> issues
> > >>>>>> happen in my Travis CI forks with reverted changes :(
> > >>>>>> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will
> > >>>>>> escalate it to Travis/APACHE infrastructure
> > >>>>>>
> > >>>>>> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <
> > >>>>>> Jarek.Potiuk@polidea.com> wrote:
> > >>>>>>
> > >>>>>>> Update: it looks like it's Travis's problem: I reverted the CI
> > >>>>>>> changes and we have the same CPU problem in the old build:
> > >>>>>>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
> > >>>>>>>
> > >>>>>>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <
> > >>>>>>> Jarek.Potiuk@polidea.com> wrote:
> > >>>>>>>
> > >>>>>>>> Hello everyone,
> > >>>>>>>>
> > >>>>>>>> We've started to experience some random failures on Travis
> > relaated
> > >>>>>>>> to lack of resources: those are either Out of Memory errors or
> > lack of CPUS
> > >>>>>>>> to run Kubernetes builds.
> > >>>>>>>>
> > >>>>>>>> I tried to rerun those, thinking it was an intermittent error.
> It
> > >>>>>>>> started happening yesterday and I have not seen it before so I
> > rather doubt
> > >>>>>>>> it is related to the latest changes.
> > >>>>>>>>
> > >>>>>>>> But I do not want to risk everyone being blocked so I am testing
> > >>>>>>>> now on my own fork if reverting the latest CI changes help. I
> > will let you
> > >>>>>>>> know and will revert in case I found old CI works in a stable
> way.
> > >>>>>>>>
> > >>>>>>>> In the meantime - I will cancel all outstanding builds  that are
> > >>>>>>>> blocking our queue and will test it both old CI and new CI in
> our
> > fork :(
> > >>>>>>>> (Travis queue limit is not helping).
> > >>>>>>>>
> > >>>>>>>> Can you please hold on with rebasing/pushing new PRs until I
> check
> > >>>>>>>> it.
> > >>>>>>>>
> > >>>>>>>> Example failures:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>    - OSError: [Errno 12] Cannot allocate memory (
> > >>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
> > >>>>>>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less than
> > >>>>>>>>    the required 2 (
> > >>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> J.
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>>
> > >>>>>>>> Jarek Potiuk
> > >>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > >>>>>>>>
> > >>>>>>>> M: +48 660 796 129 <+48660796129>
> > >>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>>
> > >>>>>>> Jarek Potiuk
> > >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>>>>>
> > >>>>>>> M: +48 660 796 129 <+48660796129>
> > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>>
> > >>>>>> Jarek Potiuk
> > >>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>>>>
> > >>>>>> M: +48 660 796 129 <+48660796129>
> > >>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>> --
> > >>>>>
> > >>>>> Jarek Potiuk
> > >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>>>
> > >>>>> M: +48 660 796 129 <+48660796129>
> > >>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>> --
> > >>>>
> > >>>> Jarek Potiuk
> > >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>>
> > >>>> M: +48 660 796 129 <+48660796129>
> > >>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>
> > >>>>
> > >>>
> > >>> --
> > >>>
> > >>> Jarek Potiuk
> > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>
> > >>> M: +48 660 796 129 <+48660796129>
> > >>> [image: Polidea] <https://www.polidea.com/>
> > >>>
> > >>>
> > >>
> > >> --
> > >>
> > >> Jarek Potiuk
> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>
> > >> M: +48 660 796 129 <+48660796129>
> > >> [image: Polidea] <https://www.polidea.com/>
> > >>
> > >>
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> > >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Travis CI random failures

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
I see issues at different Apache projects as well, Druid and Avro. They're
running out of memory. Let's see how Travis responds.

Cheers, Fokko

Op di 23 jul. 2019 om 19:43 schreef Jarek Potiuk <Ja...@polidea.com>:

> FYI. Still not fixed. Others experience this as well:
> https://github.com/travis-ci/worker/issues/604
>
> On Tue, Jul 23, 2019 at 11:34 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > No good news yet. We are getting randomly assigned 1CPU /3.5GB mem
> > instances still. Infrastructure is on it.
> >
> > On Tue, Jul 23, 2019 at 10:49 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> >> It looks like we are back to the original specs. I am runnning tests and
> >> re-enable everything if I see it works.
> >>
> >> J.
> >>
> >> On Tue, Jul 23, 2019 at 10:34 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> >> wrote:
> >>
> >>> From INFRA: "I have confirmed that our builds appear to be running with
> >>> 3.75GB memory and 1 core currently. This does not match Travis'
> standard
> >>> specs (7.5GB and 2 cores), and I have raised a ticket with their
> support. I
> >>> will respond when we hear back from Travis."
> >>>
> >>>
> >>> On Tue, Jul 23, 2019 at 10:26 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> >>> wrote:
> >>>
> >>>> It's definitely confirmed that the problem is on Travis CI side:
> >>>>
> >>>> I re-run the commit before the new CI was introduced (I cherry-picked
> a
> >>>> small doc fix related to recent sphinx dependency update) and it
> fails in
> >>>> exactly the same way (memory and cpu problems):
> >>>> https://travis-ci.org/apache/airflow/builds/562450592.
> >>>>
> >>>> For now I cannot do much but wait for the INFRA's response (and work
> on
> >>>> GitLab CI replacement of Travis).
> >>>>
> >>>> I recommend to bring some pop-corn. It's going to be an interesting
> one
> >>>> to watch.
> >>>>
> >>>> J.
> >>>>
> >>>> On Tue, Jul 23, 2019 at 9:43 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> >>>> wrote:
> >>>>
> >>>>> It's now pretty consistent and happens pretty much every time using
> >>>>> the old build system - for example here:
> >>>>> https://travis-ci.org/apache/airflow/builds/562435992.
> >>>>>
> >>>>> I will cancel all PRs and disable automated PR build on Travis until
> >>>>> we solve the problem - as it is pointless - new PRs will simply
> queue and
> >>>>> fail constantly.
> >>>>>
> >>>>> I opened critical infrastructure ticket:
> >>>>> https://issues.apache.org/jira/browse/INFRA-18787 and I am running
> >>>>> some additional tests - I run the builds from commit before the new
> CI so
> >>>>> that I see if another change since then could cause it.
> >>>>>
> >>>>> J.
> >>>>>
> >>>>>
> >>>>> On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Update2: I can confirm that the same memory/resource related issues
> >>>>>> happen in my Travis CI forks with reverted changes :(
> >>>>>> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will
> >>>>>> escalate it to Travis/APACHE infrastructure
> >>>>>>
> >>>>>> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <
> >>>>>> Jarek.Potiuk@polidea.com> wrote:
> >>>>>>
> >>>>>>> Update: it looks like it's Travis's problem: I reverted the CI
> >>>>>>> changes and we have the same CPU problem in the old build:
> >>>>>>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
> >>>>>>>
> >>>>>>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <
> >>>>>>> Jarek.Potiuk@polidea.com> wrote:
> >>>>>>>
> >>>>>>>> Hello everyone,
> >>>>>>>>
> >>>>>>>> We've started to experience some random failures on Travis
> relaated
> >>>>>>>> to lack of resources: those are either Out of Memory errors or
> lack of CPUS
> >>>>>>>> to run Kubernetes builds.
> >>>>>>>>
> >>>>>>>> I tried to rerun those, thinking it was an intermittent error. It
> >>>>>>>> started happening yesterday and I have not seen it before so I
> rather doubt
> >>>>>>>> it is related to the latest changes.
> >>>>>>>>
> >>>>>>>> But I do not want to risk everyone being blocked so I am testing
> >>>>>>>> now on my own fork if reverting the latest CI changes help. I
> will let you
> >>>>>>>> know and will revert in case I found old CI works in a stable way.
> >>>>>>>>
> >>>>>>>> In the meantime - I will cancel all outstanding builds  that are
> >>>>>>>> blocking our queue and will test it both old CI and new CI in our
> fork :(
> >>>>>>>> (Travis queue limit is not helping).
> >>>>>>>>
> >>>>>>>> Can you please hold on with rebasing/pushing new PRs until I check
> >>>>>>>> it.
> >>>>>>>>
> >>>>>>>> Example failures:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>    - OSError: [Errno 12] Cannot allocate memory (
> >>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
> >>>>>>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less than
> >>>>>>>>    the required 2 (
> >>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> J.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> Jarek Potiuk
> >>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>>>>>
> >>>>>>>> M: +48 660 796 129 <+48660796129>
> >>>>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Jarek Potiuk
> >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>>>>
> >>>>>>> M: +48 660 796 129 <+48660796129>
> >>>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Jarek Potiuk
> >>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>>>
> >>>>>> M: +48 660 796 129 <+48660796129>
> >>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Jarek Potiuk
> >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>>
> >>>>> M: +48 660 796 129 <+48660796129>
> >>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>>
> >>>> Jarek Potiuk
> >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>
> >>>> M: +48 660 796 129 <+48660796129>
> >>>> [image: Polidea] <https://www.polidea.com/>
> >>>>
> >>>>
> >>>
> >>> --
> >>>
> >>> Jarek Potiuk
> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>
> >>> M: +48 660 796 129 <+48660796129>
> >>> [image: Polidea] <https://www.polidea.com/>
> >>>
> >>>
> >>
> >> --
> >>
> >> Jarek Potiuk
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >> M: +48 660 796 129 <+48660796129>
> >> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: Travis CI random failures

Posted by Jarek Potiuk <Ja...@polidea.com>.
FYI. Still not fixed. Others experience this as well:
https://github.com/travis-ci/worker/issues/604

On Tue, Jul 23, 2019 at 11:34 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> No good news yet. We are getting randomly assigned 1CPU /3.5GB mem
> instances still. Infrastructure is on it.
>
> On Tue, Jul 23, 2019 at 10:49 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> It looks like we are back to the original specs. I am runnning tests and
>> re-enable everything if I see it works.
>>
>> J.
>>
>> On Tue, Jul 23, 2019 at 10:34 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>>> From INFRA: "I have confirmed that our builds appear to be running with
>>> 3.75GB memory and 1 core currently. This does not match Travis' standard
>>> specs (7.5GB and 2 cores), and I have raised a ticket with their support. I
>>> will respond when we hear back from Travis."
>>>
>>>
>>> On Tue, Jul 23, 2019 at 10:26 AM Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>>> It's definitely confirmed that the problem is on Travis CI side:
>>>>
>>>> I re-run the commit before the new CI was introduced (I cherry-picked a
>>>> small doc fix related to recent sphinx dependency update) and it fails in
>>>> exactly the same way (memory and cpu problems):
>>>> https://travis-ci.org/apache/airflow/builds/562450592.
>>>>
>>>> For now I cannot do much but wait for the INFRA's response (and work on
>>>> GitLab CI replacement of Travis).
>>>>
>>>> I recommend to bring some pop-corn. It's going to be an interesting one
>>>> to watch.
>>>>
>>>> J.
>>>>
>>>> On Tue, Jul 23, 2019 at 9:43 AM Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>>
>>>>> It's now pretty consistent and happens pretty much every time using
>>>>> the old build system - for example here:
>>>>> https://travis-ci.org/apache/airflow/builds/562435992.
>>>>>
>>>>> I will cancel all PRs and disable automated PR build on Travis until
>>>>> we solve the problem - as it is pointless - new PRs will simply queue and
>>>>> fail constantly.
>>>>>
>>>>> I opened critical infrastructure ticket:
>>>>> https://issues.apache.org/jira/browse/INFRA-18787 and I am running
>>>>> some additional tests - I run the builds from commit before the new CI so
>>>>> that I see if another change since then could cause it.
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>> On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <Ja...@polidea.com>
>>>>> wrote:
>>>>>
>>>>>> Update2: I can confirm that the same memory/resource related issues
>>>>>> happen in my Travis CI forks with reverted changes :(
>>>>>> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will
>>>>>> escalate it to Travis/APACHE infrastructure
>>>>>>
>>>>>> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <
>>>>>> Jarek.Potiuk@polidea.com> wrote:
>>>>>>
>>>>>>> Update: it looks like it's Travis's problem: I reverted the CI
>>>>>>> changes and we have the same CPU problem in the old build:
>>>>>>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
>>>>>>>
>>>>>>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <
>>>>>>> Jarek.Potiuk@polidea.com> wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> We've started to experience some random failures on Travis relaated
>>>>>>>> to lack of resources: those are either Out of Memory errors or lack of CPUS
>>>>>>>> to run Kubernetes builds.
>>>>>>>>
>>>>>>>> I tried to rerun those, thinking it was an intermittent error. It
>>>>>>>> started happening yesterday and I have not seen it before so I rather doubt
>>>>>>>> it is related to the latest changes.
>>>>>>>>
>>>>>>>> But I do not want to risk everyone being blocked so I am testing
>>>>>>>> now on my own fork if reverting the latest CI changes help. I will let you
>>>>>>>> know and will revert in case I found old CI works in a stable way.
>>>>>>>>
>>>>>>>> In the meantime - I will cancel all outstanding builds  that are
>>>>>>>> blocking our queue and will test it both old CI and new CI in our fork :(
>>>>>>>> (Travis queue limit is not helping).
>>>>>>>>
>>>>>>>> Can you please hold on with rebasing/pushing new PRs until I check
>>>>>>>> it.
>>>>>>>>
>>>>>>>> Example failures:
>>>>>>>>
>>>>>>>>
>>>>>>>>    - OSError: [Errno 12] Cannot allocate memory (
>>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less than
>>>>>>>>    the required 2 (
>>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>>>>
>>>>>>>>
>>>>>>>> J.
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Jarek Potiuk
>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>>>
>>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Jarek Potiuk
>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>>
>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Travis CI random failures

Posted by Jarek Potiuk <Ja...@polidea.com>.
No good news yet. We are getting randomly assigned 1CPU /3.5GB mem
instances still. Infrastructure is on it.

On Tue, Jul 23, 2019 at 10:49 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> It looks like we are back to the original specs. I am runnning tests and
> re-enable everything if I see it works.
>
> J.
>
> On Tue, Jul 23, 2019 at 10:34 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> From INFRA: "I have confirmed that our builds appear to be running with
>> 3.75GB memory and 1 core currently. This does not match Travis' standard
>> specs (7.5GB and 2 cores), and I have raised a ticket with their support. I
>> will respond when we hear back from Travis."
>>
>>
>> On Tue, Jul 23, 2019 at 10:26 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>>> It's definitely confirmed that the problem is on Travis CI side:
>>>
>>> I re-run the commit before the new CI was introduced (I cherry-picked a
>>> small doc fix related to recent sphinx dependency update) and it fails in
>>> exactly the same way (memory and cpu problems):
>>> https://travis-ci.org/apache/airflow/builds/562450592.
>>>
>>> For now I cannot do much but wait for the INFRA's response (and work on
>>> GitLab CI replacement of Travis).
>>>
>>> I recommend to bring some pop-corn. It's going to be an interesting one
>>> to watch.
>>>
>>> J.
>>>
>>> On Tue, Jul 23, 2019 at 9:43 AM Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>>> It's now pretty consistent and happens pretty much every time using the
>>>> old build system - for example here:
>>>> https://travis-ci.org/apache/airflow/builds/562435992.
>>>>
>>>> I will cancel all PRs and disable automated PR build on Travis until we
>>>> solve the problem - as it is pointless - new PRs will simply queue and fail
>>>> constantly.
>>>>
>>>> I opened critical infrastructure ticket:
>>>> https://issues.apache.org/jira/browse/INFRA-18787 and I am running
>>>> some additional tests - I run the builds from commit before the new CI so
>>>> that I see if another change since then could cause it.
>>>>
>>>> J.
>>>>
>>>>
>>>> On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>>
>>>>> Update2: I can confirm that the same memory/resource related issues
>>>>> happen in my Travis CI forks with reverted changes :(
>>>>> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will
>>>>> escalate it to Travis/APACHE infrastructure
>>>>>
>>>>> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <Ja...@polidea.com>
>>>>> wrote:
>>>>>
>>>>>> Update: it looks like it's Travis's problem: I reverted the CI
>>>>>> changes and we have the same CPU problem in the old build:
>>>>>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
>>>>>>
>>>>>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <
>>>>>> Jarek.Potiuk@polidea.com> wrote:
>>>>>>
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> We've started to experience some random failures on Travis relaated
>>>>>>> to lack of resources: those are either Out of Memory errors or lack of CPUS
>>>>>>> to run Kubernetes builds.
>>>>>>>
>>>>>>> I tried to rerun those, thinking it was an intermittent error. It
>>>>>>> started happening yesterday and I have not seen it before so I rather doubt
>>>>>>> it is related to the latest changes.
>>>>>>>
>>>>>>> But I do not want to risk everyone being blocked so I am testing now
>>>>>>> on my own fork if reverting the latest CI changes help. I will let you know
>>>>>>> and will revert in case I found old CI works in a stable way.
>>>>>>>
>>>>>>> In the meantime - I will cancel all outstanding builds  that are
>>>>>>> blocking our queue and will test it both old CI and new CI in our fork :(
>>>>>>> (Travis queue limit is not helping).
>>>>>>>
>>>>>>> Can you please hold on with rebasing/pushing new PRs until I check
>>>>>>> it.
>>>>>>>
>>>>>>> Example failures:
>>>>>>>
>>>>>>>
>>>>>>>    - OSError: [Errno 12] Cannot allocate memory (
>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less than
>>>>>>>    the required 2 (
>>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>>>
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Jarek Potiuk
>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>>
>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Travis CI random failures

Posted by Jarek Potiuk <Ja...@polidea.com>.
It looks like we are back to the original specs. I am runnning tests and
re-enable everything if I see it works.

J.

On Tue, Jul 23, 2019 at 10:34 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> From INFRA: "I have confirmed that our builds appear to be running with
> 3.75GB memory and 1 core currently. This does not match Travis' standard
> specs (7.5GB and 2 cores), and I have raised a ticket with their support. I
> will respond when we hear back from Travis."
>
>
> On Tue, Jul 23, 2019 at 10:26 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> It's definitely confirmed that the problem is on Travis CI side:
>>
>> I re-run the commit before the new CI was introduced (I cherry-picked a
>> small doc fix related to recent sphinx dependency update) and it fails in
>> exactly the same way (memory and cpu problems):
>> https://travis-ci.org/apache/airflow/builds/562450592.
>>
>> For now I cannot do much but wait for the INFRA's response (and work on
>> GitLab CI replacement of Travis).
>>
>> I recommend to bring some pop-corn. It's going to be an interesting one
>> to watch.
>>
>> J.
>>
>> On Tue, Jul 23, 2019 at 9:43 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>>> It's now pretty consistent and happens pretty much every time using the
>>> old build system - for example here:
>>> https://travis-ci.org/apache/airflow/builds/562435992.
>>>
>>> I will cancel all PRs and disable automated PR build on Travis until we
>>> solve the problem - as it is pointless - new PRs will simply queue and fail
>>> constantly.
>>>
>>> I opened critical infrastructure ticket:
>>> https://issues.apache.org/jira/browse/INFRA-18787 and I am running some
>>> additional tests - I run the builds from commit before the new CI so that I
>>> see if another change since then could cause it.
>>>
>>> J.
>>>
>>>
>>> On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>>> Update2: I can confirm that the same memory/resource related issues
>>>> happen in my Travis CI forks with reverted changes :(
>>>> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will
>>>> escalate it to Travis/APACHE infrastructure
>>>>
>>>> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>>
>>>>> Update: it looks like it's Travis's problem: I reverted the CI changes
>>>>> and we have the same CPU problem in the old build:
>>>>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
>>>>>
>>>>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <Ja...@polidea.com>
>>>>> wrote:
>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> We've started to experience some random failures on Travis relaated
>>>>>> to lack of resources: those are either Out of Memory errors or lack of CPUS
>>>>>> to run Kubernetes builds.
>>>>>>
>>>>>> I tried to rerun those, thinking it was an intermittent error. It
>>>>>> started happening yesterday and I have not seen it before so I rather doubt
>>>>>> it is related to the latest changes.
>>>>>>
>>>>>> But I do not want to risk everyone being blocked so I am testing now
>>>>>> on my own fork if reverting the latest CI changes help. I will let you know
>>>>>> and will revert in case I found old CI works in a stable way.
>>>>>>
>>>>>> In the meantime - I will cancel all outstanding builds  that are
>>>>>> blocking our queue and will test it both old CI and new CI in our fork :(
>>>>>> (Travis queue limit is not helping).
>>>>>>
>>>>>> Can you please hold on with rebasing/pushing new PRs until I check it.
>>>>>>
>>>>>> Example failures:
>>>>>>
>>>>>>
>>>>>>    - OSError: [Errno 12] Cannot allocate memory (
>>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less than the
>>>>>>    required 2 (https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>>
>>>>>>
>>>>>> J.
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Travis CI random failures

Posted by Jarek Potiuk <Ja...@polidea.com>.
From INFRA: "I have confirmed that our builds appear to be running with
3.75GB memory and 1 core currently. This does not match Travis' standard
specs (7.5GB and 2 cores), and I have raised a ticket with their support. I
will respond when we hear back from Travis."


On Tue, Jul 23, 2019 at 10:26 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> It's definitely confirmed that the problem is on Travis CI side:
>
> I re-run the commit before the new CI was introduced (I cherry-picked a
> small doc fix related to recent sphinx dependency update) and it fails in
> exactly the same way (memory and cpu problems):
> https://travis-ci.org/apache/airflow/builds/562450592.
>
> For now I cannot do much but wait for the INFRA's response (and work on
> GitLab CI replacement of Travis).
>
> I recommend to bring some pop-corn. It's going to be an interesting one to
> watch.
>
> J.
>
> On Tue, Jul 23, 2019 at 9:43 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> It's now pretty consistent and happens pretty much every time using the
>> old build system - for example here:
>> https://travis-ci.org/apache/airflow/builds/562435992.
>>
>> I will cancel all PRs and disable automated PR build on Travis until we
>> solve the problem - as it is pointless - new PRs will simply queue and fail
>> constantly.
>>
>> I opened critical infrastructure ticket:
>> https://issues.apache.org/jira/browse/INFRA-18787 and I am running some
>> additional tests - I run the builds from commit before the new CI so that I
>> see if another change since then could cause it.
>>
>> J.
>>
>>
>> On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>>> Update2: I can confirm that the same memory/resource related issues
>>> happen in my Travis CI forks with reverted changes :(
>>> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will escalate
>>> it to Travis/APACHE infrastructure
>>>
>>> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>>> Update: it looks like it's Travis's problem: I reverted the CI changes
>>>> and we have the same CPU problem in the old build:
>>>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
>>>>
>>>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>>
>>>>> Hello everyone,
>>>>>
>>>>> We've started to experience some random failures on Travis relaated to
>>>>> lack of resources: those are either Out of Memory errors or lack of CPUS to
>>>>> run Kubernetes builds.
>>>>>
>>>>> I tried to rerun those, thinking it was an intermittent error. It
>>>>> started happening yesterday and I have not seen it before so I rather doubt
>>>>> it is related to the latest changes.
>>>>>
>>>>> But I do not want to risk everyone being blocked so I am testing now
>>>>> on my own fork if reverting the latest CI changes help. I will let you know
>>>>> and will revert in case I found old CI works in a stable way.
>>>>>
>>>>> In the meantime - I will cancel all outstanding builds  that are
>>>>> blocking our queue and will test it both old CI and new CI in our fork :(
>>>>> (Travis queue limit is not helping).
>>>>>
>>>>> Can you please hold on with rebasing/pushing new PRs until I check it.
>>>>>
>>>>> Example failures:
>>>>>
>>>>>
>>>>>    - OSError: [Errno 12] Cannot allocate memory (
>>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less than the
>>>>>    required 2 (https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>>
>>>>>
>>>>> J.
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Travis CI random failures

Posted by Jarek Potiuk <Ja...@polidea.com>.
It's definitely confirmed that the problem is on Travis CI side:

I re-run the commit before the new CI was introduced (I cherry-picked a
small doc fix related to recent sphinx dependency update) and it fails in
exactly the same way (memory and cpu problems):
https://travis-ci.org/apache/airflow/builds/562450592.

For now I cannot do much but wait for the INFRA's response (and work on
GitLab CI replacement of Travis).

I recommend to bring some pop-corn. It's going to be an interesting one to
watch.

J.

On Tue, Jul 23, 2019 at 9:43 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> It's now pretty consistent and happens pretty much every time using the
> old build system - for example here:
> https://travis-ci.org/apache/airflow/builds/562435992.
>
> I will cancel all PRs and disable automated PR build on Travis until we
> solve the problem - as it is pointless - new PRs will simply queue and fail
> constantly.
>
> I opened critical infrastructure ticket:
> https://issues.apache.org/jira/browse/INFRA-18787 and I am running some
> additional tests - I run the builds from commit before the new CI so that I
> see if another change since then could cause it.
>
> J.
>
>
> On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> Update2: I can confirm that the same memory/resource related issues
>> happen in my Travis CI forks with reverted changes :(
>> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will escalate
>> it to Travis/APACHE infrastructure
>>
>> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>>> Update: it looks like it's Travis's problem: I reverted the CI changes
>>> and we have the same CPU problem in the old build:
>>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
>>>
>>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> We've started to experience some random failures on Travis relaated to
>>>> lack of resources: those are either Out of Memory errors or lack of CPUS to
>>>> run Kubernetes builds.
>>>>
>>>> I tried to rerun those, thinking it was an intermittent error. It
>>>> started happening yesterday and I have not seen it before so I rather doubt
>>>> it is related to the latest changes.
>>>>
>>>> But I do not want to risk everyone being blocked so I am testing now on
>>>> my own fork if reverting the latest CI changes help. I will let you know
>>>> and will revert in case I found old CI works in a stable way.
>>>>
>>>> In the meantime - I will cancel all outstanding builds  that are
>>>> blocking our queue and will test it both old CI and new CI in our fork :(
>>>> (Travis queue limit is not helping).
>>>>
>>>> Can you please hold on with rebasing/pushing new PRs until I check it.
>>>>
>>>> Example failures:
>>>>
>>>>
>>>>    - OSError: [Errno 12] Cannot allocate memory (
>>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less than the
>>>>    required 2 (https://travis-ci.org/apache/airflow/jobs/562395978)
>>>>
>>>>
>>>> J.
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Travis CI random failures

Posted by Jarek Potiuk <Ja...@polidea.com>.
It's now pretty consistent and happens pretty much every time using the old
build system - for example here:
https://travis-ci.org/apache/airflow/builds/562435992.

I will cancel all PRs and disable automated PR build on Travis until we
solve the problem - as it is pointless - new PRs will simply queue and fail
constantly.

I opened critical infrastructure ticket:
https://issues.apache.org/jira/browse/INFRA-18787 and I am running some
additional tests - I run the builds from commit before the new CI so that I
see if another change since then could cause it.

J.


On Tue, Jul 23, 2019 at 8:55 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Update2: I can confirm that the same memory/resource related issues happen
> in my Travis CI forks with reverted changes :(
> https://travis-ci.org/potiuk/airflow/builds/562430507 . I will escalate
> it to Travis/APACHE infrastructure
>
> On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> Update: it looks like it's Travis's problem: I reverted the CI changes
>> and we have the same CPU problem in the old build:
>> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
>>
>> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>>> Hello everyone,
>>>
>>> We've started to experience some random failures on Travis relaated to
>>> lack of resources: those are either Out of Memory errors or lack of CPUS to
>>> run Kubernetes builds.
>>>
>>> I tried to rerun those, thinking it was an intermittent error. It
>>> started happening yesterday and I have not seen it before so I rather doubt
>>> it is related to the latest changes.
>>>
>>> But I do not want to risk everyone being blocked so I am testing now on
>>> my own fork if reverting the latest CI changes help. I will let you know
>>> and will revert in case I found old CI works in a stable way.
>>>
>>> In the meantime - I will cancel all outstanding builds  that are
>>> blocking our queue and will test it both old CI and new CI in our fork :(
>>> (Travis queue limit is not helping).
>>>
>>> Can you please hold on with rebasing/pushing new PRs until I check it.
>>>
>>> Example failures:
>>>
>>>
>>>    - OSError: [Errno 12] Cannot allocate memory (
>>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>>    - [ERROR NumCPU]: the number of available CPUs 1 is less than the
>>>    required 2 (https://travis-ci.org/apache/airflow/jobs/562395978)
>>>
>>>
>>> J.
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Travis CI random failures

Posted by Jarek Potiuk <Ja...@polidea.com>.
Update2: I can confirm that the same memory/resource related issues happen
in my Travis CI forks with reverted changes :(
https://travis-ci.org/potiuk/airflow/builds/562430507 . I will escalate it
to Travis/APACHE infrastructure

On Tue, Jul 23, 2019 at 8:35 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Update: it looks like it's Travis's problem: I reverted the CI changes and
> we have the same CPU problem in the old build:
> https://travis-ci.org/potiuk/airflow/jobs/562430517 .
>
> On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> Hello everyone,
>>
>> We've started to experience some random failures on Travis relaated to
>> lack of resources: those are either Out of Memory errors or lack of CPUS to
>> run Kubernetes builds.
>>
>> I tried to rerun those, thinking it was an intermittent error. It started
>> happening yesterday and I have not seen it before so I rather doubt it is
>> related to the latest changes.
>>
>> But I do not want to risk everyone being blocked so I am testing now on
>> my own fork if reverting the latest CI changes help. I will let you know
>> and will revert in case I found old CI works in a stable way.
>>
>> In the meantime - I will cancel all outstanding builds  that are blocking
>> our queue and will test it both old CI and new CI in our fork :( (Travis
>> queue limit is not helping).
>>
>> Can you please hold on with rebasing/pushing new PRs until I check it.
>>
>> Example failures:
>>
>>
>>    - OSError: [Errno 12] Cannot allocate memory (
>>    https://travis-ci.org/apache/airflow/jobs/562395978)
>>    - [ERROR NumCPU]: the number of available CPUs 1 is less than the
>>    required 2 (https://travis-ci.org/apache/airflow/jobs/562395978)
>>
>>
>> J.
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Travis CI random failures

Posted by Jarek Potiuk <Ja...@polidea.com>.
Update: it looks like it's Travis's problem: I reverted the CI changes and
we have the same CPU problem in the old build:
https://travis-ci.org/potiuk/airflow/jobs/562430517 .

On Tue, Jul 23, 2019 at 8:32 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Hello everyone,
>
> We've started to experience some random failures on Travis relaated to
> lack of resources: those are either Out of Memory errors or lack of CPUS to
> run Kubernetes builds.
>
> I tried to rerun those, thinking it was an intermittent error. It started
> happening yesterday and I have not seen it before so I rather doubt it is
> related to the latest changes.
>
> But I do not want to risk everyone being blocked so I am testing now on my
> own fork if reverting the latest CI changes help. I will let you know and
> will revert in case I found old CI works in a stable way.
>
> In the meantime - I will cancel all outstanding builds  that are blocking
> our queue and will test it both old CI and new CI in our fork :( (Travis
> queue limit is not helping).
>
> Can you please hold on with rebasing/pushing new PRs until I check it.
>
> Example failures:
>
>
>    - OSError: [Errno 12] Cannot allocate memory (
>    https://travis-ci.org/apache/airflow/jobs/562395978)
>    - [ERROR NumCPU]: the number of available CPUs 1 is less than the
>    required 2 (https://travis-ci.org/apache/airflow/jobs/562395978)
>
>
> J.
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>