You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Pedro Larroy <pe...@gmail.com> on 2018/10/02 00:25:33 UTC

Re: Time out for Travis CI

I think there's two approaches that we can take to mitigate the build &
test time problem, in one hand use a paid travis CI plan, in other improve
the unit tests in suites and only run a core set of tests, as we should do
on devices, but on this case we reduce coverage.

https://travis-ci.com/plans

Pedro.

On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <ea...@gmail.com> wrote:

> This makes sense. Thanks
>
> On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > Hey Zhennan, yes this is the exact problem, and I agree with your points
> > completely.  This is why when we first added Travis we attempted to
> > communicate that it would be informational only, and that we'd need to
> > iterate on the config before it would be a test that people should
> consider
> > 'required'.  Apologies, we should have been more straightforward about
> > those tradeoffs.  The strong point in favour of adding Travis in
> > informational mode was that we had a serious MacOS specific bug that we
> > wanted to verify was fixed.
> >
> > The good news is I've opened a PR which I hope will speed up these builds
> > to the point that they won't rely on caching.  Once it is merged it would
> > be very helpful if you could rebase on this PR and test to ensure that
> > large changes no longer hit the global timeout without cache.
> > https://github.com/apache/incubator-mxnet/pull/12706
> >
> > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <zh...@intel.com>
> > wrote:
> >
> > > Hi YiZhi and Kellen,
> > >
> > > From my point of view, travis should be able to get passed from a
> scratch
> > > build. Pending result on ccache hit/miss is not a good idea. For this
> PR,
> > > as it changed many header file, lots of files need be recompiled, just
> > like
> > > a scratch build. I think that's the reason that travis timeout. This
> > should
> > > be fixed before enabling travis, as it will block any change to those
> > base
> > > header file. Again, it's not a special case with this PR only, you can
> > find
> > > same problem on other PRs:
> > >
> > >
> > >
> >
> https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
> > >
> > >
> >
> https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
> > >
> > >
> > > Thanks,
> > > Zhennan
> > >
> > > -----Original Message-----
> > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
> > > Sent: Sunday, September 30, 2018 5:15 AM
> > > To: eazhi.liu@gmail.com
> > > Cc: dev@mxnet.incubator.apache.org
> > > Subject: Re: Time out for Travis CI
> > >
> > > while other PRs are all good.
> > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <ea...@gmail.com> wrote:
> > > >
> > > > Honestly I don't know yet. I can help to investigate. Just given the
> > > > evidence that, travis timeout every time it gets re-triggered - 2
> > > > times at least. Correct me if I'm wrong @ Zhennan On Sat, Sep 29,
> 2018
> > > > at 1:54 PM kellen sunderland <ke...@gmail.com> wrote:
> > > > >
> > > > > Reading over the PR I don't see what aspects would cause extra
> > > > > runtime YiZhi, could you point them out?
> > > > >
> > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <ea...@gmail.com>
> > wrote:
> > > > >
> > > > > > Kellen, I think this PR introduces extra runtime in CI, thus
> > > > > > causes the timeout. Which means, once merged, every PR later will
> > > > > > see same timeout in travis.
> > > > > >
> > > > > > So shall we modify the changes to decrease the test running time?
> > > > > > or just disable the Travis CI?
> > > > > >
> > > > > >
> > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
> > > > > > <zh...@intel.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi Kellen,
> > > > > > >
> > > > > > > Thanks for your explanation. Do you have a time plan to solve
> > > > > > > the
> > > > > > timeout issue? Rebasing can't work for my case. Or shall we run
> it
> > > > > > silently to disallow it voting X for overall CI result? Because
> > > > > > most developers are used to ignore the PRs with 'X'.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Zhennan
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: kellen sunderland [mailto:kellen.sunderland@gmail.com]
> > > > > > > Sent: Friday, September 28, 2018 10:38 PM
> > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > Subject: Re: Time out for Travis CI
> > > > > > >
> > > > > > > Hey Zhennan, you're safe to ignore Travis failures for now.
> > > > > > > They're
> > > > > > just informational.
> > > > > > >
> > > > > > > The reason you sometimes see quick builds and sometimes see
> slow
> > > > > > > builds
> > > > > > is that we're making use of ccache in between builds.  If your PR
> > > > > > is similar to what's in master you should build very quickly, if
> > > > > > not it's going to take a while and likely time out.  If you see
> > > > > > timeouts rebasing may speed things up.  Unfortunately the
> timeouts
> > > > > > are global and we're not able to increase them.  I'm hoping that
> > > > > > adding artifact caching will speed up future builds to the point
> > > > > > that test runs and builds can be executed in under the global
> limit
> > > (which is ~50 minutes).
> > > > > > >
> > > > > > > -Kellen
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan
> > > > > > > <zh...@intel.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi MXNet devs,
> > > > > > > >
> > > > > > > > I'm struggled with new Travis CI for a while, it always run
> > > > > > > > time out for this PR:
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/12530
> > > > > > > >
> > > > > > > > Most of the time, Jenkins CI can pass, while Travis can't be
> > > > > > > > finished within 50 minutes. For this PR, it shouldn't affect
> > > > > > > > much on the build time or unit test time. Also, I saw other
> PR
> > > has same problem, eg.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> https://travis-ci.org/apache/incubator-mxnet/builds/433172088?
> > > > > > > > utm_sour ce=github_status&utm_medium=notification
> > > > > > > >
> > > > > > > >
> https://travis-ci.org/apache/incubator-mxnet/builds/434404305?
> > > > > > > > utm_sour ce=github_status&utm_medium=notification
> > > > > > > >
> > > > > > > > According to the time stamp from Travis, all passed PR are
> > > > > > > > within small code change, and can complete `make -j2` within
> > > > > > > > 25s. But for timeout case, 'make -j2' will need about 1600s.
> > > > > > > > Does Travis do incremental build for each test? Shall we
> > > > > > > > increase time limit for large PR? Can we add more time stamp
> > > > > > > > for build and unites stage to
> > > > > > help understand what's going on there?
> > > > > > > >
> > > > > > > > Thanks in advance,
> > > > > > > > Zhennan
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Yizhi Liu
> > > > > > DMLC member
> > > > > > Amazon Web Services
> > > > > > Vancouver, Canada
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Yizhi Liu
> > > > DMLC member
> > > > Amazon Web Services
> > > > Vancouver, Canada
> > >
> > >
> > >
> > > --
> > > Yizhi Liu
> > > DMLC member
> > > Amazon Web Services
> > > Vancouver, Canada
> > >
> >
> --
> Yizhi Liu
> DMLC member
> Amazon Web Services
> Vancouver, Canada
>

Re: Time out for Travis CI

Posted by Qing Lan <la...@live.com>.
From the link it looks like "Travis CI offers a free account" instead of Apache buy it. It may just be a free user account with extension on the numbers of nodes it can runs on. I think we may need to reach out to Travis or Apache to clarify that we currently have the service that paid version have instead of an extension of "free user account".

Thanks,
Qing

On 10/1/18, 6:15 PM, "kellen sunderland" <ke...@gmail.com> wrote:

    I actually thought we were already using a paid plan through Apache
    https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci
    
    On Tue, Oct 2, 2018, 3:11 AM Qing Lan <la...@live.com> wrote:
    
    > Are we currently on a free plan? If we are, probably the unlimited build
    > minutes would help
    >
    > Thanks,
    > Qing
    >
    > On 10/1/18, 6:08 PM, "kellen sunderland" <ke...@gmail.com>
    > wrote:
    >
    >     Does the global time out change for paid plans?  I looked into it
    > briefly
    >     but didn't see anything that would indicate it does.
    >
    >     On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy <
    > pedro.larroy.lists@gmail.com>
    >     wrote:
    >
    >     > I think there's two approaches that we can take to mitigate the
    > build &
    >     > test time problem, in one hand use a paid travis CI plan, in other
    > improve
    >     > the unit tests in suites and only run a core set of tests, as we
    > should do
    >     > on devices, but on this case we reduce coverage.
    >     >
    >     > https://travis-ci.com/plans
    >     >
    >     > Pedro.
    >     >
    >     > On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <ea...@gmail.com>
    > wrote:
    >     >
    >     > > This makes sense. Thanks
    >     > >
    >     > > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
    >     > > kellen.sunderland@gmail.com> wrote:
    >     > >
    >     > > > Hey Zhennan, yes this is the exact problem, and I agree with your
    >     > points
    >     > > > completely.  This is why when we first added Travis we attempted
    > to
    >     > > > communicate that it would be informational only, and that we'd
    > need to
    >     > > > iterate on the config before it would be a test that people
    > should
    >     > > consider
    >     > > > 'required'.  Apologies, we should have been more straightforward
    > about
    >     > > > those tradeoffs.  The strong point in favour of adding Travis in
    >     > > > informational mode was that we had a serious MacOS specific bug
    > that we
    >     > > > wanted to verify was fixed.
    >     > > >
    >     > > > The good news is I've opened a PR which I hope will speed up
    > these
    >     > builds
    >     > > > to the point that they won't rely on caching.  Once it is merged
    > it
    >     > would
    >     > > > be very helpful if you could rebase on this PR and test to
    > ensure that
    >     > > > large changes no longer hit the global timeout without cache.
    >     > > > https://github.com/apache/incubator-mxnet/pull/12706
    >     > > >
    >     > > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <
    > zhennan.qin@intel.com>
    >     > > > wrote:
    >     > > >
    >     > > > > Hi YiZhi and Kellen,
    >     > > > >
    >     > > > > From my point of view, travis should be able to get passed
    > from a
    >     > > scratch
    >     > > > > build. Pending result on ccache hit/miss is not a good idea.
    > For this
    >     > > PR,
    >     > > > > as it changed many header file, lots of files need be
    > recompiled,
    >     > just
    >     > > > like
    >     > > > > a scratch build. I think that's the reason that travis
    > timeout. This
    >     > > > should
    >     > > > > be fixed before enabling travis, as it will block any change
    > to those
    >     > > > base
    >     > > > > header file. Again, it's not a special case with this PR only,
    > you
    >     > can
    >     > > > find
    >     > > > > same problem on other PRs:
    >     > > > >
    >     > > > >
    >     > > > >
    >     > > >
    >     > >
    >     >
    > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
    >     > > > >
    >     > > > >
    >     > > >
    >     > >
    >     >
    > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
    >     > > > >
    >     > > > >
    >     > > > > Thanks,
    >     > > > > Zhennan
    >     > > > >
    >     > > > > -----Original Message-----
    >     > > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
    >     > > > > Sent: Sunday, September 30, 2018 5:15 AM
    >     > > > > To: eazhi.liu@gmail.com
    >     > > > > Cc: dev@mxnet.incubator.apache.org
    >     > > > > Subject: Re: Time out for Travis CI
    >     > > > >
    >     > > > > while other PRs are all good.
    >     > > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <eazhi.liu@gmail.com
    > >
    >     > wrote:
    >     > > > > >
    >     > > > > > Honestly I don't know yet. I can help to investigate. Just
    > given
    >     > the
    >     > > > > > evidence that, travis timeout every time it gets
    > re-triggered - 2
    >     > > > > > times at least. Correct me if I'm wrong @ Zhennan On Sat,
    > Sep 29,
    >     > > 2018
    >     > > > > > at 1:54 PM kellen sunderland <ke...@gmail.com>
    > wrote:
    >     > > > > > >
    >     > > > > > > Reading over the PR I don't see what aspects would cause
    > extra
    >     > > > > > > runtime YiZhi, could you point them out?
    >     > > > > > >
    >     > > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <
    > eazhi.liu@gmail.com>
    >     > > > wrote:
    >     > > > > > >
    >     > > > > > > > Kellen, I think this PR introduces extra runtime in CI,
    > thus
    >     > > > > > > > causes the timeout. Which means, once merged, every PR
    > later
    >     > will
    >     > > > > > > > see same timeout in travis.
    >     > > > > > > >
    >     > > > > > > > So shall we modify the changes to decrease the test
    > running
    >     > time?
    >     > > > > > > > or just disable the Travis CI?
    >     > > > > > > >
    >     > > > > > > >
    >     > > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
    >     > > > > > > > <zh...@intel.com>
    >     > > > > > > > wrote:
    >     > > > > > > > >
    >     > > > > > > > > Hi Kellen,
    >     > > > > > > > >
    >     > > > > > > > > Thanks for your explanation. Do you have a time plan
    > to solve
    >     > > > > > > > > the
    >     > > > > > > > timeout issue? Rebasing can't work for my case. Or shall
    > we run
    >     > > it
    >     > > > > > > > silently to disallow it voting X for overall CI result?
    > Because
    >     > > > > > > > most developers are used to ignore the PRs with 'X'.
    >     > > > > > > > >
    >     > > > > > > > > Thanks,
    >     > > > > > > > > Zhennan
    >     > > > > > > > >
    >     > > > > > > > > -----Original Message-----
    >     > > > > > > > > From: kellen sunderland [mailto:
    > kellen.sunderland@gmail.com]
    >     > > > > > > > > Sent: Friday, September 28, 2018 10:38 PM
    >     > > > > > > > > To: dev@mxnet.incubator.apache.org
    >     > > > > > > > > Subject: Re: Time out for Travis CI
    >     > > > > > > > >
    >     > > > > > > > > Hey Zhennan, you're safe to ignore Travis failures for
    > now.
    >     > > > > > > > > They're
    >     > > > > > > > just informational.
    >     > > > > > > > >
    >     > > > > > > > > The reason you sometimes see quick builds and
    > sometimes see
    >     > > slow
    >     > > > > > > > > builds
    >     > > > > > > > is that we're making use of ccache in between builds.
    > If your
    >     > PR
    >     > > > > > > > is similar to what's in master you should build very
    > quickly,
    >     > if
    >     > > > > > > > not it's going to take a while and likely time out.  If
    > you see
    >     > > > > > > > timeouts rebasing may speed things up.  Unfortunately the
    >     > > timeouts
    >     > > > > > > > are global and we're not able to increase them.  I'm
    > hoping
    >     > that
    >     > > > > > > > adding artifact caching will speed up future builds to
    > the
    >     > point
    >     > > > > > > > that test runs and builds can be executed in under the
    > global
    >     > > limit
    >     > > > > (which is ~50 minutes).
    >     > > > > > > > >
    >     > > > > > > > > -Kellen
    >     > > > > > > > >
    >     > > > > > > > >
    >     > > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan
    >     > > > > > > > > <zh...@intel.com>
    >     > > > > > > > wrote:
    >     > > > > > > > >
    >     > > > > > > > > > Hi MXNet devs,
    >     > > > > > > > > >
    >     > > > > > > > > > I'm struggled with new Travis CI for a while, it
    > always run
    >     > > > > > > > > > time out for this PR:
    >     > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12530
    >     > > > > > > > > >
    >     > > > > > > > > > Most of the time, Jenkins CI can pass, while Travis
    > can't
    >     > be
    >     > > > > > > > > > finished within 50 minutes. For this PR, it shouldn't
    >     > affect
    >     > > > > > > > > > much on the build time or unit test time. Also, I
    > saw other
    >     > > PR
    >     > > > > has same problem, eg.
    >     > > > > > > > > >
    >     > > > > > > > > >
    >     > > > > > > > > >
    >     > > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?
    >     > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
    >     > > > > > > > > >
    >     > > > > > > > > >
    >     > > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?
    >     > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
    >     > > > > > > > > >
    >     > > > > > > > > > According to the time stamp from Travis, all passed
    > PR are
    >     > > > > > > > > > within small code change, and can complete `make -j2`
    >     > within
    >     > > > > > > > > > 25s. But for timeout case, 'make -j2' will need about
    >     > 1600s.
    >     > > > > > > > > > Does Travis do incremental build for each test?
    > Shall we
    >     > > > > > > > > > increase time limit for large PR? Can we add more
    > time
    >     > stamp
    >     > > > > > > > > > for build and unites stage to
    >     > > > > > > > help understand what's going on there?
    >     > > > > > > > > >
    >     > > > > > > > > > Thanks in advance,
    >     > > > > > > > > > Zhennan
    >     > > > > > > > > >
    >     > > > > > > >
    >     > > > > > > >
    >     > > > > > > >
    >     > > > > > > > --
    >     > > > > > > > Yizhi Liu
    >     > > > > > > > DMLC member
    >     > > > > > > > Amazon Web Services
    >     > > > > > > > Vancouver, Canada
    >     > > > > > > >
    >     > > > > >
    >     > > > > >
    >     > > > > >
    >     > > > > > --
    >     > > > > > Yizhi Liu
    >     > > > > > DMLC member
    >     > > > > > Amazon Web Services
    >     > > > > > Vancouver, Canada
    >     > > > >
    >     > > > >
    >     > > > >
    >     > > > > --
    >     > > > > Yizhi Liu
    >     > > > > DMLC member
    >     > > > > Amazon Web Services
    >     > > > > Vancouver, Canada
    >     > > > >
    >     > > >
    >     > > --
    >     > > Yizhi Liu
    >     > > DMLC member
    >     > > Amazon Web Services
    >     > > Vancouver, Canada
    >     > >
    >     >
    >
    >
    >
    


Re: Time out for Travis CI

Posted by kellen sunderland <ke...@gmail.com>.
Well I'd propose we get clarification from Travis before bring the issue up
with infra.  No point debating something with infra or amongst ourselves if
it's not possible.

Orthogonal to the paid account option let's merge this speedup to unblock
Intel.

On Oct 2, 2018 4:37 AM, "Marco de Abreu"
<ma...@googlemail.com.invalid> wrote:

I think the timeout and other limitations have been employed by Apache
Infra and not by Travis. They didn't say that specifically, but they
already made me aware that we might get further restrictions if we consume
too many resources.


kellen sunderland <ke...@gmail.com> schrieb am Di., 2. Okt.
2018, 04:34:


> Still worth following up with Travis (I've already messaged them).
They're
> in the middle of reorganizing their business model and merging paid and
> free accounts into the same service, so maybe this policy is changing.  It
> doesn't make a lot of sense to me that public repo accounts would have
> timeout limits that are different to private repo accounts in cases where
> they are both paid.
>
> On Tue, Oct 2, 2018, 4:27 AM Marco de Abreu
> <ma...@googlemail.com.invalid> wrote:
>
> > Apache has it's own shared Travis fleet. We are basically using an
> > on-premise version of the paid Travis plan. That was the information I
> got
> > from Infra when I had a chat with them a few days ago. But from that
> > conversation it was made pretty clear that we cannot increase the
limits.
> >
> > -Marco
> >
> > kellen sunderland <ke...@gmail.com> schrieb am Di., 2. Okt.
> > 2018, 03:25:
> >
> > > Interesting, this page seems to indicate that private projects do have
> a
> > > longer time out.  I'll drop Travis a quick email and see what the deal
> > > would be for our project.
> > > https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts.
> > >
> > > On Tue, Oct 2, 2018, 3:15 AM kellen sunderland <
> > > kellen.sunderland@gmail.com>
> > > wrote:
> > >
> > > > I actually thought we were already using a paid plan through Apache
> > > >
> https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci
> > > >
> > > > On Tue, Oct 2, 2018, 3:11 AM Qing Lan <la...@live.com> wrote:
> > > >
> > > >> Are we currently on a free plan? If we are, probably the unlimited
> > build
> > > >> minutes would help
> > > >>
> > > >> Thanks,
> > > >> Qing
> > > >>
> > > >> On 10/1/18, 6:08 PM, "kellen sunderland" <
> > kellen.sunderland@gmail.com>
> > > >> wrote:
> > > >>
> > > >>     Does the global time out change for paid plans?  I looked into
> it
> > > >> briefly
> > > >>     but didn't see anything that would indicate it does.
> > > >>
> > > >>     On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy <
> > > >> pedro.larroy.lists@gmail.com>
> > > >>     wrote:
> > > >>
> > > >>     > I think there's two approaches that we can take to mitigate
> the
> > > >> build &
> > > >>     > test time problem, in one hand use a paid travis CI plan, in
> > other
> > > >> improve
> > > >>     > the unit tests in suites and only run a core set of tests, as
> we
> > > >> should do
> > > >>     > on devices, but on this case we reduce coverage.
> > > >>     >
> > > >>     > https://travis-ci.com/plans
> > > >>     >
> > > >>     > Pedro.
> > > >>     >
> > > >>     > On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <
> eazhi.liu@gmail.com>
> > > >> wrote:
> > > >>     >
> > > >>     > > This makes sense. Thanks
> > > >>     > >
> > > >>     > > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
> > > >>     > > kellen.sunderland@gmail.com> wrote:
> > > >>     > >
> > > >>     > > > Hey Zhennan, yes this is the exact problem, and I agree
> with
> > > >> your
> > > >>     > points
> > > >>     > > > completely.  This is why when we first added Travis we
> > > >> attempted to
> > > >>     > > > communicate that it would be informational only, and that
> > we'd
> > > >> need to
> > > >>     > > > iterate on the config before it would be a test that
> people
> > > >> should
> > > >>     > > consider
> > > >>     > > > 'required'.  Apologies, we should have been more
> > > >> straightforward about
> > > >>     > > > those tradeoffs.  The strong point in favour of adding
> > Travis
> > > in
> > > >>     > > > informational mode was that we had a serious MacOS
> specific
> > > bug
> > > >> that we
> > > >>     > > > wanted to verify was fixed.
> > > >>     > > >
> > > >>     > > > The good news is I've opened a PR which I hope will speed
> up
> > > >> these
> > > >>     > builds
> > > >>     > > > to the point that they won't rely on caching.  Once it is
> > > >> merged it
> > > >>     > would
> > > >>     > > > be very helpful if you could rebase on this PR and test
to
> > > >> ensure that
> > > >>     > > > large changes no longer hit the global timeout without
> > cache.
> > > >>     > > > https://github.com/apache/incubator-mxnet/pull/12706
> > > >>     > > >
> > > >>     > > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <
> > > >> zhennan.qin@intel.com>
> > > >>     > > > wrote:
> > > >>     > > >
> > > >>     > > > > Hi YiZhi and Kellen,
> > > >>     > > > >
> > > >>     > > > > From my point of view, travis should be able to get
> passed
> > > >> from a
> > > >>     > > scratch
> > > >>     > > > > build. Pending result on ccache hit/miss is not a good
> > idea.
> > > >> For this
> > > >>     > > PR,
> > > >>     > > > > as it changed many header file, lots of files need be
> > > >> recompiled,
> > > >>     > just
> > > >>     > > > like
> > > >>     > > > > a scratch build. I think that's the reason that travis
> > > >> timeout. This
> > > >>     > > > should
> > > >>     > > > > be fixed before enabling travis, as it will block any
> > change
> > > >> to those
> > > >>     > > > base
> > > >>     > > > > header file. Again, it's not a special case with this
PR
> > > >> only, you
> > > >>     > can
> > > >>     > > > find
> > > >>     > > > > same problem on other PRs:
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > >
> > > >>     > >
> > > >>     >
> > > >>
> > >
> >
>
https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > >
> > > >>     > >
> > > >>     >
> > > >>
> > >
> >
>
https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > > > Thanks,
> > > >>     > > > > Zhennan
> > > >>     > > > >
> > > >>     > > > > -----Original Message-----
> > > >>     > > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
> > > >>     > > > > Sent: Sunday, September 30, 2018 5:15 AM
> > > >>     > > > > To: eazhi.liu@gmail.com
> > > >>     > > > > Cc: dev@mxnet.incubator.apache.org
> > > >>     > > > > Subject: Re: Time out for Travis CI
> > > >>     > > > >
> > > >>     > > > > while other PRs are all good.
> > > >>     > > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <
> > > >> eazhi.liu@gmail.com>
> > > >>     > wrote:
> > > >>     > > > > >
> > > >>     > > > > > Honestly I don't know yet. I can help to investigate.
> > Just
> > > >> given
> > > >>     > the
> > > >>     > > > > > evidence that, travis timeout every time it gets
> > > >> re-triggered - 2
> > > >>     > > > > > times at least. Correct me if I'm wrong @ Zhennan On
> > Sat,
> > > >> Sep 29,
> > > >>     > > 2018
> > > >>     > > > > > at 1:54 PM kellen sunderland <
> > kellen.sunderland@gmail.com
> > > >
> > > >> wrote:
> > > >>     > > > > > >
> > > >>     > > > > > > Reading over the PR I don't see what aspects would
> > cause
> > > >> extra
> > > >>     > > > > > > runtime YiZhi, could you point them out?
> > > >>     > > > > > >
> > > >>     > > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <
> > > >> eazhi.liu@gmail.com>
> > > >>     > > > wrote:
> > > >>     > > > > > >
> > > >>     > > > > > > > Kellen, I think this PR introduces extra runtime
> in
> > > CI,
> > > >> thus
> > > >>     > > > > > > > causes the timeout. Which means, once merged,
> every
> > PR
> > > >> later
> > > >>     > will
> > > >>     > > > > > > > see same timeout in travis.
> > > >>     > > > > > > >
> > > >>     > > > > > > > So shall we modify the changes to decrease the
> test
> > > >> running
> > > >>     > time?
> > > >>     > > > > > > > or just disable the Travis CI?
> > > >>     > > > > > > >
> > > >>     > > > > > > >
> > > >>     > > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
> > > >>     > > > > > > > <zh...@intel.com>
> > > >>     > > > > > > > wrote:
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > Hi Kellen,
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > Thanks for your explanation. Do you have a time
> > plan
> > > >> to solve
> > > >>     > > > > > > > > the
> > > >>     > > > > > > > timeout issue? Rebasing can't work for my case.
Or
> > > >> shall we run
> > > >>     > > it
> > > >>     > > > > > > > silently to disallow it voting X for overall CI
> > > result?
> > > >> Because
> > > >>     > > > > > > > most developers are used to ignore the PRs with
> 'X'.
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > Thanks,
> > > >>     > > > > > > > > Zhennan
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > -----Original Message-----
> > > >>     > > > > > > > > From: kellen sunderland [mailto:
> > > >> kellen.sunderland@gmail.com]
> > > >>     > > > > > > > > Sent: Friday, September 28, 2018 10:38 PM
> > > >>     > > > > > > > > To: dev@mxnet.incubator.apache.org
> > > >>     > > > > > > > > Subject: Re: Time out for Travis CI
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > Hey Zhennan, you're safe to ignore Travis
> failures
> > > >> for now.
> > > >>     > > > > > > > > They're
> > > >>     > > > > > > > just informational.
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > The reason you sometimes see quick builds and
> > > >> sometimes see
> > > >>     > > slow
> > > >>     > > > > > > > > builds
> > > >>     > > > > > > > is that we're making use of ccache in between
> > builds.
> > > >> If your
> > > >>     > PR
> > > >>     > > > > > > > is similar to what's in master you should build
> very
> > > >> quickly,
> > > >>     > if
> > > >>     > > > > > > > not it's going to take a while and likely time
> out.
> > > If
> > > >> you see
> > > >>     > > > > > > > timeouts rebasing may speed things up.
> > Unfortunately
> > > >> the
> > > >>     > > timeouts
> > > >>     > > > > > > > are global and we're not able to increase them.
> I'm
> > > >> hoping
> > > >>     > that
> > > >>     > > > > > > > adding artifact caching will speed up future
> builds
> > to
> > > >> the
> > > >>     > point
> > > >>     > > > > > > > that test runs and builds can be executed in
under
> > the
> > > >> global
> > > >>     > > limit
> > > >>     > > > > (which is ~50 minutes).
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > -Kellen
> > > >>     > > > > > > > >
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan
> > > >>     > > > > > > > > <zh...@intel.com>
> > > >>     > > > > > > > wrote:
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > > Hi MXNet devs,
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > > I'm struggled with new Travis CI for a while,
> it
> > > >> always run
> > > >>     > > > > > > > > > time out for this PR:
> > > >>     > > > > > > > > >
> > > >> https://github.com/apache/incubator-mxnet/pull/12530
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > > Most of the time, Jenkins CI can pass, while
> > > Travis
> > > >> can't
> > > >>     > be
> > > >>     > > > > > > > > > finished within 50 minutes. For this PR, it
> > > >> shouldn't
> > > >>     > affect
> > > >>     > > > > > > > > > much on the build time or unit test time.
> Also,
> > I
> > > >> saw other
> > > >>     > > PR
> > > >>     > > > > has same problem, eg.
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > >
> > > >>     > >
> https://travis-ci.org/apache/incubator-mxnet/builds/433172088
> > ?
> > > >>     > > > > > > > > > utm_sour
> > ce=github_status&utm_medium=notification
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > >
> > > >>     > >
> https://travis-ci.org/apache/incubator-mxnet/builds/434404305
> > ?
> > > >>     > > > > > > > > > utm_sour
> > ce=github_status&utm_medium=notification
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > > According to the time stamp from Travis, all
> > > passed
> > > >> PR are
> > > >>     > > > > > > > > > within small code change, and can complete
> `make
> > > >> -j2`
> > > >>     > within
> > > >>     > > > > > > > > > 25s. But for timeout case, 'make -j2' will
> need
> > > >> about
> > > >>     > 1600s.
> > > >>     > > > > > > > > > Does Travis do incremental build for each
> test?
> > > >> Shall we
> > > >>     > > > > > > > > > increase time limit for large PR? Can we add
> > more
> > > >> time
> > > >>     > stamp
> > > >>     > > > > > > > > > for build and unites stage to
> > > >>     > > > > > > > help understand what's going on there?
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > > Thanks in advance,
> > > >>     > > > > > > > > > Zhennan
> > > >>     > > > > > > > > >
> > > >>     > > > > > > >
> > > >>     > > > > > > >
> > > >>     > > > > > > >
> > > >>     > > > > > > > --
> > > >>     > > > > > > > Yizhi Liu
> > > >>     > > > > > > > DMLC member
> > > >>     > > > > > > > Amazon Web Services
> > > >>     > > > > > > > Vancouver, Canada
> > > >>     > > > > > > >
> > > >>     > > > > >
> > > >>     > > > > >
> > > >>     > > > > >
> > > >>     > > > > > --
> > > >>     > > > > > Yizhi Liu
> > > >>     > > > > > DMLC member
> > > >>     > > > > > Amazon Web Services
> > > >>     > > > > > Vancouver, Canada
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > > > --
> > > >>     > > > > Yizhi Liu
> > > >>     > > > > DMLC member
> > > >>     > > > > Amazon Web Services
> > > >>     > > > > Vancouver, Canada
> > > >>     > > > >
> > > >>     > > >
> > > >>     > > --
> > > >>     > > Yizhi Liu
> > > >>     > > DMLC member
> > > >>     > > Amazon Web Services
> > > >>     > > Vancouver, Canada
> > > >>     > >
> > > >>     >
> > > >>
> > > >>
> > > >>
> > >
> >
>

Re: Time out for Travis CI

Posted by Marco de Abreu <ma...@googlemail.com.INVALID>.
I think the timeout and other limitations have been employed by Apache
Infra and not by Travis. They didn't say that specifically, but they
already made me aware that we might get further restrictions if we consume
too many resources.

kellen sunderland <ke...@gmail.com> schrieb am Di., 2. Okt.
2018, 04:34:

> Still worth following up with Travis (I've already messaged them).  They're
> in the middle of reorganizing their business model and merging paid and
> free accounts into the same service, so maybe this policy is changing.  It
> doesn't make a lot of sense to me that public repo accounts would have
> timeout limits that are different to private repo accounts in cases where
> they are both paid.
>
> On Tue, Oct 2, 2018, 4:27 AM Marco de Abreu
> <ma...@googlemail.com.invalid> wrote:
>
> > Apache has it's own shared Travis fleet. We are basically using an
> > on-premise version of the paid Travis plan. That was the information I
> got
> > from Infra when I had a chat with them a few days ago. But from that
> > conversation it was made pretty clear that we cannot increase the limits.
> >
> > -Marco
> >
> > kellen sunderland <ke...@gmail.com> schrieb am Di., 2. Okt.
> > 2018, 03:25:
> >
> > > Interesting, this page seems to indicate that private projects do have
> a
> > > longer time out.  I'll drop Travis a quick email and see what the deal
> > > would be for our project.
> > > https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts.
> > >
> > > On Tue, Oct 2, 2018, 3:15 AM kellen sunderland <
> > > kellen.sunderland@gmail.com>
> > > wrote:
> > >
> > > > I actually thought we were already using a paid plan through Apache
> > > >
> https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci
> > > >
> > > > On Tue, Oct 2, 2018, 3:11 AM Qing Lan <la...@live.com> wrote:
> > > >
> > > >> Are we currently on a free plan? If we are, probably the unlimited
> > build
> > > >> minutes would help
> > > >>
> > > >> Thanks,
> > > >> Qing
> > > >>
> > > >> On 10/1/18, 6:08 PM, "kellen sunderland" <
> > kellen.sunderland@gmail.com>
> > > >> wrote:
> > > >>
> > > >>     Does the global time out change for paid plans?  I looked into
> it
> > > >> briefly
> > > >>     but didn't see anything that would indicate it does.
> > > >>
> > > >>     On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy <
> > > >> pedro.larroy.lists@gmail.com>
> > > >>     wrote:
> > > >>
> > > >>     > I think there's two approaches that we can take to mitigate
> the
> > > >> build &
> > > >>     > test time problem, in one hand use a paid travis CI plan, in
> > other
> > > >> improve
> > > >>     > the unit tests in suites and only run a core set of tests, as
> we
> > > >> should do
> > > >>     > on devices, but on this case we reduce coverage.
> > > >>     >
> > > >>     > https://travis-ci.com/plans
> > > >>     >
> > > >>     > Pedro.
> > > >>     >
> > > >>     > On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <
> eazhi.liu@gmail.com>
> > > >> wrote:
> > > >>     >
> > > >>     > > This makes sense. Thanks
> > > >>     > >
> > > >>     > > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
> > > >>     > > kellen.sunderland@gmail.com> wrote:
> > > >>     > >
> > > >>     > > > Hey Zhennan, yes this is the exact problem, and I agree
> with
> > > >> your
> > > >>     > points
> > > >>     > > > completely.  This is why when we first added Travis we
> > > >> attempted to
> > > >>     > > > communicate that it would be informational only, and that
> > we'd
> > > >> need to
> > > >>     > > > iterate on the config before it would be a test that
> people
> > > >> should
> > > >>     > > consider
> > > >>     > > > 'required'.  Apologies, we should have been more
> > > >> straightforward about
> > > >>     > > > those tradeoffs.  The strong point in favour of adding
> > Travis
> > > in
> > > >>     > > > informational mode was that we had a serious MacOS
> specific
> > > bug
> > > >> that we
> > > >>     > > > wanted to verify was fixed.
> > > >>     > > >
> > > >>     > > > The good news is I've opened a PR which I hope will speed
> up
> > > >> these
> > > >>     > builds
> > > >>     > > > to the point that they won't rely on caching.  Once it is
> > > >> merged it
> > > >>     > would
> > > >>     > > > be very helpful if you could rebase on this PR and test to
> > > >> ensure that
> > > >>     > > > large changes no longer hit the global timeout without
> > cache.
> > > >>     > > > https://github.com/apache/incubator-mxnet/pull/12706
> > > >>     > > >
> > > >>     > > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <
> > > >> zhennan.qin@intel.com>
> > > >>     > > > wrote:
> > > >>     > > >
> > > >>     > > > > Hi YiZhi and Kellen,
> > > >>     > > > >
> > > >>     > > > > From my point of view, travis should be able to get
> passed
> > > >> from a
> > > >>     > > scratch
> > > >>     > > > > build. Pending result on ccache hit/miss is not a good
> > idea.
> > > >> For this
> > > >>     > > PR,
> > > >>     > > > > as it changed many header file, lots of files need be
> > > >> recompiled,
> > > >>     > just
> > > >>     > > > like
> > > >>     > > > > a scratch build. I think that's the reason that travis
> > > >> timeout. This
> > > >>     > > > should
> > > >>     > > > > be fixed before enabling travis, as it will block any
> > change
> > > >> to those
> > > >>     > > > base
> > > >>     > > > > header file. Again, it's not a special case with this PR
> > > >> only, you
> > > >>     > can
> > > >>     > > > find
> > > >>     > > > > same problem on other PRs:
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > >
> > > >>     > >
> > > >>     >
> > > >>
> > >
> >
> https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > >
> > > >>     > >
> > > >>     >
> > > >>
> > >
> >
> https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > > > Thanks,
> > > >>     > > > > Zhennan
> > > >>     > > > >
> > > >>     > > > > -----Original Message-----
> > > >>     > > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
> > > >>     > > > > Sent: Sunday, September 30, 2018 5:15 AM
> > > >>     > > > > To: eazhi.liu@gmail.com
> > > >>     > > > > Cc: dev@mxnet.incubator.apache.org
> > > >>     > > > > Subject: Re: Time out for Travis CI
> > > >>     > > > >
> > > >>     > > > > while other PRs are all good.
> > > >>     > > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <
> > > >> eazhi.liu@gmail.com>
> > > >>     > wrote:
> > > >>     > > > > >
> > > >>     > > > > > Honestly I don't know yet. I can help to investigate.
> > Just
> > > >> given
> > > >>     > the
> > > >>     > > > > > evidence that, travis timeout every time it gets
> > > >> re-triggered - 2
> > > >>     > > > > > times at least. Correct me if I'm wrong @ Zhennan On
> > Sat,
> > > >> Sep 29,
> > > >>     > > 2018
> > > >>     > > > > > at 1:54 PM kellen sunderland <
> > kellen.sunderland@gmail.com
> > > >
> > > >> wrote:
> > > >>     > > > > > >
> > > >>     > > > > > > Reading over the PR I don't see what aspects would
> > cause
> > > >> extra
> > > >>     > > > > > > runtime YiZhi, could you point them out?
> > > >>     > > > > > >
> > > >>     > > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <
> > > >> eazhi.liu@gmail.com>
> > > >>     > > > wrote:
> > > >>     > > > > > >
> > > >>     > > > > > > > Kellen, I think this PR introduces extra runtime
> in
> > > CI,
> > > >> thus
> > > >>     > > > > > > > causes the timeout. Which means, once merged,
> every
> > PR
> > > >> later
> > > >>     > will
> > > >>     > > > > > > > see same timeout in travis.
> > > >>     > > > > > > >
> > > >>     > > > > > > > So shall we modify the changes to decrease the
> test
> > > >> running
> > > >>     > time?
> > > >>     > > > > > > > or just disable the Travis CI?
> > > >>     > > > > > > >
> > > >>     > > > > > > >
> > > >>     > > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
> > > >>     > > > > > > > <zh...@intel.com>
> > > >>     > > > > > > > wrote:
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > Hi Kellen,
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > Thanks for your explanation. Do you have a time
> > plan
> > > >> to solve
> > > >>     > > > > > > > > the
> > > >>     > > > > > > > timeout issue? Rebasing can't work for my case. Or
> > > >> shall we run
> > > >>     > > it
> > > >>     > > > > > > > silently to disallow it voting X for overall CI
> > > result?
> > > >> Because
> > > >>     > > > > > > > most developers are used to ignore the PRs with
> 'X'.
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > Thanks,
> > > >>     > > > > > > > > Zhennan
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > -----Original Message-----
> > > >>     > > > > > > > > From: kellen sunderland [mailto:
> > > >> kellen.sunderland@gmail.com]
> > > >>     > > > > > > > > Sent: Friday, September 28, 2018 10:38 PM
> > > >>     > > > > > > > > To: dev@mxnet.incubator.apache.org
> > > >>     > > > > > > > > Subject: Re: Time out for Travis CI
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > Hey Zhennan, you're safe to ignore Travis
> failures
> > > >> for now.
> > > >>     > > > > > > > > They're
> > > >>     > > > > > > > just informational.
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > The reason you sometimes see quick builds and
> > > >> sometimes see
> > > >>     > > slow
> > > >>     > > > > > > > > builds
> > > >>     > > > > > > > is that we're making use of ccache in between
> > builds.
> > > >> If your
> > > >>     > PR
> > > >>     > > > > > > > is similar to what's in master you should build
> very
> > > >> quickly,
> > > >>     > if
> > > >>     > > > > > > > not it's going to take a while and likely time
> out.
> > > If
> > > >> you see
> > > >>     > > > > > > > timeouts rebasing may speed things up.
> > Unfortunately
> > > >> the
> > > >>     > > timeouts
> > > >>     > > > > > > > are global and we're not able to increase them.
> I'm
> > > >> hoping
> > > >>     > that
> > > >>     > > > > > > > adding artifact caching will speed up future
> builds
> > to
> > > >> the
> > > >>     > point
> > > >>     > > > > > > > that test runs and builds can be executed in under
> > the
> > > >> global
> > > >>     > > limit
> > > >>     > > > > (which is ~50 minutes).
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > -Kellen
> > > >>     > > > > > > > >
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan
> > > >>     > > > > > > > > <zh...@intel.com>
> > > >>     > > > > > > > wrote:
> > > >>     > > > > > > > >
> > > >>     > > > > > > > > > Hi MXNet devs,
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > > I'm struggled with new Travis CI for a while,
> it
> > > >> always run
> > > >>     > > > > > > > > > time out for this PR:
> > > >>     > > > > > > > > >
> > > >> https://github.com/apache/incubator-mxnet/pull/12530
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > > Most of the time, Jenkins CI can pass, while
> > > Travis
> > > >> can't
> > > >>     > be
> > > >>     > > > > > > > > > finished within 50 minutes. For this PR, it
> > > >> shouldn't
> > > >>     > affect
> > > >>     > > > > > > > > > much on the build time or unit test time.
> Also,
> > I
> > > >> saw other
> > > >>     > > PR
> > > >>     > > > > has same problem, eg.
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > >
> > > >>     > >
> https://travis-ci.org/apache/incubator-mxnet/builds/433172088
> > ?
> > > >>     > > > > > > > > > utm_sour
> > ce=github_status&utm_medium=notification
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > >
> > > >>     > >
> https://travis-ci.org/apache/incubator-mxnet/builds/434404305
> > ?
> > > >>     > > > > > > > > > utm_sour
> > ce=github_status&utm_medium=notification
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > > According to the time stamp from Travis, all
> > > passed
> > > >> PR are
> > > >>     > > > > > > > > > within small code change, and can complete
> `make
> > > >> -j2`
> > > >>     > within
> > > >>     > > > > > > > > > 25s. But for timeout case, 'make -j2' will
> need
> > > >> about
> > > >>     > 1600s.
> > > >>     > > > > > > > > > Does Travis do incremental build for each
> test?
> > > >> Shall we
> > > >>     > > > > > > > > > increase time limit for large PR? Can we add
> > more
> > > >> time
> > > >>     > stamp
> > > >>     > > > > > > > > > for build and unites stage to
> > > >>     > > > > > > > help understand what's going on there?
> > > >>     > > > > > > > > >
> > > >>     > > > > > > > > > Thanks in advance,
> > > >>     > > > > > > > > > Zhennan
> > > >>     > > > > > > > > >
> > > >>     > > > > > > >
> > > >>     > > > > > > >
> > > >>     > > > > > > >
> > > >>     > > > > > > > --
> > > >>     > > > > > > > Yizhi Liu
> > > >>     > > > > > > > DMLC member
> > > >>     > > > > > > > Amazon Web Services
> > > >>     > > > > > > > Vancouver, Canada
> > > >>     > > > > > > >
> > > >>     > > > > >
> > > >>     > > > > >
> > > >>     > > > > >
> > > >>     > > > > > --
> > > >>     > > > > > Yizhi Liu
> > > >>     > > > > > DMLC member
> > > >>     > > > > > Amazon Web Services
> > > >>     > > > > > Vancouver, Canada
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > > >
> > > >>     > > > > --
> > > >>     > > > > Yizhi Liu
> > > >>     > > > > DMLC member
> > > >>     > > > > Amazon Web Services
> > > >>     > > > > Vancouver, Canada
> > > >>     > > > >
> > > >>     > > >
> > > >>     > > --
> > > >>     > > Yizhi Liu
> > > >>     > > DMLC member
> > > >>     > > Amazon Web Services
> > > >>     > > Vancouver, Canada
> > > >>     > >
> > > >>     >
> > > >>
> > > >>
> > > >>
> > >
> >
>

Re: Time out for Travis CI

Posted by kellen sunderland <ke...@gmail.com>.
Still worth following up with Travis (I've already messaged them).  They're
in the middle of reorganizing their business model and merging paid and
free accounts into the same service, so maybe this policy is changing.  It
doesn't make a lot of sense to me that public repo accounts would have
timeout limits that are different to private repo accounts in cases where
they are both paid.

On Tue, Oct 2, 2018, 4:27 AM Marco de Abreu
<ma...@googlemail.com.invalid> wrote:

> Apache has it's own shared Travis fleet. We are basically using an
> on-premise version of the paid Travis plan. That was the information I got
> from Infra when I had a chat with them a few days ago. But from that
> conversation it was made pretty clear that we cannot increase the limits.
>
> -Marco
>
> kellen sunderland <ke...@gmail.com> schrieb am Di., 2. Okt.
> 2018, 03:25:
>
> > Interesting, this page seems to indicate that private projects do have a
> > longer time out.  I'll drop Travis a quick email and see what the deal
> > would be for our project.
> > https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts.
> >
> > On Tue, Oct 2, 2018, 3:15 AM kellen sunderland <
> > kellen.sunderland@gmail.com>
> > wrote:
> >
> > > I actually thought we were already using a paid plan through Apache
> > > https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci
> > >
> > > On Tue, Oct 2, 2018, 3:11 AM Qing Lan <la...@live.com> wrote:
> > >
> > >> Are we currently on a free plan? If we are, probably the unlimited
> build
> > >> minutes would help
> > >>
> > >> Thanks,
> > >> Qing
> > >>
> > >> On 10/1/18, 6:08 PM, "kellen sunderland" <
> kellen.sunderland@gmail.com>
> > >> wrote:
> > >>
> > >>     Does the global time out change for paid plans?  I looked into it
> > >> briefly
> > >>     but didn't see anything that would indicate it does.
> > >>
> > >>     On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy <
> > >> pedro.larroy.lists@gmail.com>
> > >>     wrote:
> > >>
> > >>     > I think there's two approaches that we can take to mitigate the
> > >> build &
> > >>     > test time problem, in one hand use a paid travis CI plan, in
> other
> > >> improve
> > >>     > the unit tests in suites and only run a core set of tests, as we
> > >> should do
> > >>     > on devices, but on this case we reduce coverage.
> > >>     >
> > >>     > https://travis-ci.com/plans
> > >>     >
> > >>     > Pedro.
> > >>     >
> > >>     > On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <ea...@gmail.com>
> > >> wrote:
> > >>     >
> > >>     > > This makes sense. Thanks
> > >>     > >
> > >>     > > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
> > >>     > > kellen.sunderland@gmail.com> wrote:
> > >>     > >
> > >>     > > > Hey Zhennan, yes this is the exact problem, and I agree with
> > >> your
> > >>     > points
> > >>     > > > completely.  This is why when we first added Travis we
> > >> attempted to
> > >>     > > > communicate that it would be informational only, and that
> we'd
> > >> need to
> > >>     > > > iterate on the config before it would be a test that people
> > >> should
> > >>     > > consider
> > >>     > > > 'required'.  Apologies, we should have been more
> > >> straightforward about
> > >>     > > > those tradeoffs.  The strong point in favour of adding
> Travis
> > in
> > >>     > > > informational mode was that we had a serious MacOS specific
> > bug
> > >> that we
> > >>     > > > wanted to verify was fixed.
> > >>     > > >
> > >>     > > > The good news is I've opened a PR which I hope will speed up
> > >> these
> > >>     > builds
> > >>     > > > to the point that they won't rely on caching.  Once it is
> > >> merged it
> > >>     > would
> > >>     > > > be very helpful if you could rebase on this PR and test to
> > >> ensure that
> > >>     > > > large changes no longer hit the global timeout without
> cache.
> > >>     > > > https://github.com/apache/incubator-mxnet/pull/12706
> > >>     > > >
> > >>     > > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <
> > >> zhennan.qin@intel.com>
> > >>     > > > wrote:
> > >>     > > >
> > >>     > > > > Hi YiZhi and Kellen,
> > >>     > > > >
> > >>     > > > > From my point of view, travis should be able to get passed
> > >> from a
> > >>     > > scratch
> > >>     > > > > build. Pending result on ccache hit/miss is not a good
> idea.
> > >> For this
> > >>     > > PR,
> > >>     > > > > as it changed many header file, lots of files need be
> > >> recompiled,
> > >>     > just
> > >>     > > > like
> > >>     > > > > a scratch build. I think that's the reason that travis
> > >> timeout. This
> > >>     > > > should
> > >>     > > > > be fixed before enabling travis, as it will block any
> change
> > >> to those
> > >>     > > > base
> > >>     > > > > header file. Again, it's not a special case with this PR
> > >> only, you
> > >>     > can
> > >>     > > > find
> > >>     > > > > same problem on other PRs:
> > >>     > > > >
> > >>     > > > >
> > >>     > > > >
> > >>     > > >
> > >>     > >
> > >>     >
> > >>
> >
> https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
> > >>     > > > >
> > >>     > > > >
> > >>     > > >
> > >>     > >
> > >>     >
> > >>
> >
> https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
> > >>     > > > >
> > >>     > > > >
> > >>     > > > > Thanks,
> > >>     > > > > Zhennan
> > >>     > > > >
> > >>     > > > > -----Original Message-----
> > >>     > > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
> > >>     > > > > Sent: Sunday, September 30, 2018 5:15 AM
> > >>     > > > > To: eazhi.liu@gmail.com
> > >>     > > > > Cc: dev@mxnet.incubator.apache.org
> > >>     > > > > Subject: Re: Time out for Travis CI
> > >>     > > > >
> > >>     > > > > while other PRs are all good.
> > >>     > > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <
> > >> eazhi.liu@gmail.com>
> > >>     > wrote:
> > >>     > > > > >
> > >>     > > > > > Honestly I don't know yet. I can help to investigate.
> Just
> > >> given
> > >>     > the
> > >>     > > > > > evidence that, travis timeout every time it gets
> > >> re-triggered - 2
> > >>     > > > > > times at least. Correct me if I'm wrong @ Zhennan On
> Sat,
> > >> Sep 29,
> > >>     > > 2018
> > >>     > > > > > at 1:54 PM kellen sunderland <
> kellen.sunderland@gmail.com
> > >
> > >> wrote:
> > >>     > > > > > >
> > >>     > > > > > > Reading over the PR I don't see what aspects would
> cause
> > >> extra
> > >>     > > > > > > runtime YiZhi, could you point them out?
> > >>     > > > > > >
> > >>     > > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <
> > >> eazhi.liu@gmail.com>
> > >>     > > > wrote:
> > >>     > > > > > >
> > >>     > > > > > > > Kellen, I think this PR introduces extra runtime in
> > CI,
> > >> thus
> > >>     > > > > > > > causes the timeout. Which means, once merged, every
> PR
> > >> later
> > >>     > will
> > >>     > > > > > > > see same timeout in travis.
> > >>     > > > > > > >
> > >>     > > > > > > > So shall we modify the changes to decrease the test
> > >> running
> > >>     > time?
> > >>     > > > > > > > or just disable the Travis CI?
> > >>     > > > > > > >
> > >>     > > > > > > >
> > >>     > > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
> > >>     > > > > > > > <zh...@intel.com>
> > >>     > > > > > > > wrote:
> > >>     > > > > > > > >
> > >>     > > > > > > > > Hi Kellen,
> > >>     > > > > > > > >
> > >>     > > > > > > > > Thanks for your explanation. Do you have a time
> plan
> > >> to solve
> > >>     > > > > > > > > the
> > >>     > > > > > > > timeout issue? Rebasing can't work for my case. Or
> > >> shall we run
> > >>     > > it
> > >>     > > > > > > > silently to disallow it voting X for overall CI
> > result?
> > >> Because
> > >>     > > > > > > > most developers are used to ignore the PRs with 'X'.
> > >>     > > > > > > > >
> > >>     > > > > > > > > Thanks,
> > >>     > > > > > > > > Zhennan
> > >>     > > > > > > > >
> > >>     > > > > > > > > -----Original Message-----
> > >>     > > > > > > > > From: kellen sunderland [mailto:
> > >> kellen.sunderland@gmail.com]
> > >>     > > > > > > > > Sent: Friday, September 28, 2018 10:38 PM
> > >>     > > > > > > > > To: dev@mxnet.incubator.apache.org
> > >>     > > > > > > > > Subject: Re: Time out for Travis CI
> > >>     > > > > > > > >
> > >>     > > > > > > > > Hey Zhennan, you're safe to ignore Travis failures
> > >> for now.
> > >>     > > > > > > > > They're
> > >>     > > > > > > > just informational.
> > >>     > > > > > > > >
> > >>     > > > > > > > > The reason you sometimes see quick builds and
> > >> sometimes see
> > >>     > > slow
> > >>     > > > > > > > > builds
> > >>     > > > > > > > is that we're making use of ccache in between
> builds.
> > >> If your
> > >>     > PR
> > >>     > > > > > > > is similar to what's in master you should build very
> > >> quickly,
> > >>     > if
> > >>     > > > > > > > not it's going to take a while and likely time out.
> > If
> > >> you see
> > >>     > > > > > > > timeouts rebasing may speed things up.
> Unfortunately
> > >> the
> > >>     > > timeouts
> > >>     > > > > > > > are global and we're not able to increase them.  I'm
> > >> hoping
> > >>     > that
> > >>     > > > > > > > adding artifact caching will speed up future builds
> to
> > >> the
> > >>     > point
> > >>     > > > > > > > that test runs and builds can be executed in under
> the
> > >> global
> > >>     > > limit
> > >>     > > > > (which is ~50 minutes).
> > >>     > > > > > > > >
> > >>     > > > > > > > > -Kellen
> > >>     > > > > > > > >
> > >>     > > > > > > > >
> > >>     > > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan
> > >>     > > > > > > > > <zh...@intel.com>
> > >>     > > > > > > > wrote:
> > >>     > > > > > > > >
> > >>     > > > > > > > > > Hi MXNet devs,
> > >>     > > > > > > > > >
> > >>     > > > > > > > > > I'm struggled with new Travis CI for a while, it
> > >> always run
> > >>     > > > > > > > > > time out for this PR:
> > >>     > > > > > > > > >
> > >> https://github.com/apache/incubator-mxnet/pull/12530
> > >>     > > > > > > > > >
> > >>     > > > > > > > > > Most of the time, Jenkins CI can pass, while
> > Travis
> > >> can't
> > >>     > be
> > >>     > > > > > > > > > finished within 50 minutes. For this PR, it
> > >> shouldn't
> > >>     > affect
> > >>     > > > > > > > > > much on the build time or unit test time. Also,
> I
> > >> saw other
> > >>     > > PR
> > >>     > > > > has same problem, eg.
> > >>     > > > > > > > > >
> > >>     > > > > > > > > >
> > >>     > > > > > > > > >
> > >>     > > https://travis-ci.org/apache/incubator-mxnet/builds/433172088
> ?
> > >>     > > > > > > > > > utm_sour
> ce=github_status&utm_medium=notification
> > >>     > > > > > > > > >
> > >>     > > > > > > > > >
> > >>     > > https://travis-ci.org/apache/incubator-mxnet/builds/434404305
> ?
> > >>     > > > > > > > > > utm_sour
> ce=github_status&utm_medium=notification
> > >>     > > > > > > > > >
> > >>     > > > > > > > > > According to the time stamp from Travis, all
> > passed
> > >> PR are
> > >>     > > > > > > > > > within small code change, and can complete `make
> > >> -j2`
> > >>     > within
> > >>     > > > > > > > > > 25s. But for timeout case, 'make -j2' will need
> > >> about
> > >>     > 1600s.
> > >>     > > > > > > > > > Does Travis do incremental build for each test?
> > >> Shall we
> > >>     > > > > > > > > > increase time limit for large PR? Can we add
> more
> > >> time
> > >>     > stamp
> > >>     > > > > > > > > > for build and unites stage to
> > >>     > > > > > > > help understand what's going on there?
> > >>     > > > > > > > > >
> > >>     > > > > > > > > > Thanks in advance,
> > >>     > > > > > > > > > Zhennan
> > >>     > > > > > > > > >
> > >>     > > > > > > >
> > >>     > > > > > > >
> > >>     > > > > > > >
> > >>     > > > > > > > --
> > >>     > > > > > > > Yizhi Liu
> > >>     > > > > > > > DMLC member
> > >>     > > > > > > > Amazon Web Services
> > >>     > > > > > > > Vancouver, Canada
> > >>     > > > > > > >
> > >>     > > > > >
> > >>     > > > > >
> > >>     > > > > >
> > >>     > > > > > --
> > >>     > > > > > Yizhi Liu
> > >>     > > > > > DMLC member
> > >>     > > > > > Amazon Web Services
> > >>     > > > > > Vancouver, Canada
> > >>     > > > >
> > >>     > > > >
> > >>     > > > >
> > >>     > > > > --
> > >>     > > > > Yizhi Liu
> > >>     > > > > DMLC member
> > >>     > > > > Amazon Web Services
> > >>     > > > > Vancouver, Canada
> > >>     > > > >
> > >>     > > >
> > >>     > > --
> > >>     > > Yizhi Liu
> > >>     > > DMLC member
> > >>     > > Amazon Web Services
> > >>     > > Vancouver, Canada
> > >>     > >
> > >>     >
> > >>
> > >>
> > >>
> >
>

Re: Time out for Travis CI

Posted by Marco de Abreu <ma...@googlemail.com.INVALID>.
Apache has it's own shared Travis fleet. We are basically using an
on-premise version of the paid Travis plan. That was the information I got
from Infra when I had a chat with them a few days ago. But from that
conversation it was made pretty clear that we cannot increase the limits.

-Marco

kellen sunderland <ke...@gmail.com> schrieb am Di., 2. Okt.
2018, 03:25:

> Interesting, this page seems to indicate that private projects do have a
> longer time out.  I'll drop Travis a quick email and see what the deal
> would be for our project.
> https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts.
>
> On Tue, Oct 2, 2018, 3:15 AM kellen sunderland <
> kellen.sunderland@gmail.com>
> wrote:
>
> > I actually thought we were already using a paid plan through Apache
> > https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci
> >
> > On Tue, Oct 2, 2018, 3:11 AM Qing Lan <la...@live.com> wrote:
> >
> >> Are we currently on a free plan? If we are, probably the unlimited build
> >> minutes would help
> >>
> >> Thanks,
> >> Qing
> >>
> >> On 10/1/18, 6:08 PM, "kellen sunderland" <ke...@gmail.com>
> >> wrote:
> >>
> >>     Does the global time out change for paid plans?  I looked into it
> >> briefly
> >>     but didn't see anything that would indicate it does.
> >>
> >>     On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy <
> >> pedro.larroy.lists@gmail.com>
> >>     wrote:
> >>
> >>     > I think there's two approaches that we can take to mitigate the
> >> build &
> >>     > test time problem, in one hand use a paid travis CI plan, in other
> >> improve
> >>     > the unit tests in suites and only run a core set of tests, as we
> >> should do
> >>     > on devices, but on this case we reduce coverage.
> >>     >
> >>     > https://travis-ci.com/plans
> >>     >
> >>     > Pedro.
> >>     >
> >>     > On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <ea...@gmail.com>
> >> wrote:
> >>     >
> >>     > > This makes sense. Thanks
> >>     > >
> >>     > > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
> >>     > > kellen.sunderland@gmail.com> wrote:
> >>     > >
> >>     > > > Hey Zhennan, yes this is the exact problem, and I agree with
> >> your
> >>     > points
> >>     > > > completely.  This is why when we first added Travis we
> >> attempted to
> >>     > > > communicate that it would be informational only, and that we'd
> >> need to
> >>     > > > iterate on the config before it would be a test that people
> >> should
> >>     > > consider
> >>     > > > 'required'.  Apologies, we should have been more
> >> straightforward about
> >>     > > > those tradeoffs.  The strong point in favour of adding Travis
> in
> >>     > > > informational mode was that we had a serious MacOS specific
> bug
> >> that we
> >>     > > > wanted to verify was fixed.
> >>     > > >
> >>     > > > The good news is I've opened a PR which I hope will speed up
> >> these
> >>     > builds
> >>     > > > to the point that they won't rely on caching.  Once it is
> >> merged it
> >>     > would
> >>     > > > be very helpful if you could rebase on this PR and test to
> >> ensure that
> >>     > > > large changes no longer hit the global timeout without cache.
> >>     > > > https://github.com/apache/incubator-mxnet/pull/12706
> >>     > > >
> >>     > > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <
> >> zhennan.qin@intel.com>
> >>     > > > wrote:
> >>     > > >
> >>     > > > > Hi YiZhi and Kellen,
> >>     > > > >
> >>     > > > > From my point of view, travis should be able to get passed
> >> from a
> >>     > > scratch
> >>     > > > > build. Pending result on ccache hit/miss is not a good idea.
> >> For this
> >>     > > PR,
> >>     > > > > as it changed many header file, lots of files need be
> >> recompiled,
> >>     > just
> >>     > > > like
> >>     > > > > a scratch build. I think that's the reason that travis
> >> timeout. This
> >>     > > > should
> >>     > > > > be fixed before enabling travis, as it will block any change
> >> to those
> >>     > > > base
> >>     > > > > header file. Again, it's not a special case with this PR
> >> only, you
> >>     > can
> >>     > > > find
> >>     > > > > same problem on other PRs:
> >>     > > > >
> >>     > > > >
> >>     > > > >
> >>     > > >
> >>     > >
> >>     >
> >>
> https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
> >>     > > > >
> >>     > > > >
> >>     > > >
> >>     > >
> >>     >
> >>
> https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
> >>     > > > >
> >>     > > > >
> >>     > > > > Thanks,
> >>     > > > > Zhennan
> >>     > > > >
> >>     > > > > -----Original Message-----
> >>     > > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
> >>     > > > > Sent: Sunday, September 30, 2018 5:15 AM
> >>     > > > > To: eazhi.liu@gmail.com
> >>     > > > > Cc: dev@mxnet.incubator.apache.org
> >>     > > > > Subject: Re: Time out for Travis CI
> >>     > > > >
> >>     > > > > while other PRs are all good.
> >>     > > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <
> >> eazhi.liu@gmail.com>
> >>     > wrote:
> >>     > > > > >
> >>     > > > > > Honestly I don't know yet. I can help to investigate. Just
> >> given
> >>     > the
> >>     > > > > > evidence that, travis timeout every time it gets
> >> re-triggered - 2
> >>     > > > > > times at least. Correct me if I'm wrong @ Zhennan On Sat,
> >> Sep 29,
> >>     > > 2018
> >>     > > > > > at 1:54 PM kellen sunderland <kellen.sunderland@gmail.com
> >
> >> wrote:
> >>     > > > > > >
> >>     > > > > > > Reading over the PR I don't see what aspects would cause
> >> extra
> >>     > > > > > > runtime YiZhi, could you point them out?
> >>     > > > > > >
> >>     > > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <
> >> eazhi.liu@gmail.com>
> >>     > > > wrote:
> >>     > > > > > >
> >>     > > > > > > > Kellen, I think this PR introduces extra runtime in
> CI,
> >> thus
> >>     > > > > > > > causes the timeout. Which means, once merged, every PR
> >> later
> >>     > will
> >>     > > > > > > > see same timeout in travis.
> >>     > > > > > > >
> >>     > > > > > > > So shall we modify the changes to decrease the test
> >> running
> >>     > time?
> >>     > > > > > > > or just disable the Travis CI?
> >>     > > > > > > >
> >>     > > > > > > >
> >>     > > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
> >>     > > > > > > > <zh...@intel.com>
> >>     > > > > > > > wrote:
> >>     > > > > > > > >
> >>     > > > > > > > > Hi Kellen,
> >>     > > > > > > > >
> >>     > > > > > > > > Thanks for your explanation. Do you have a time plan
> >> to solve
> >>     > > > > > > > > the
> >>     > > > > > > > timeout issue? Rebasing can't work for my case. Or
> >> shall we run
> >>     > > it
> >>     > > > > > > > silently to disallow it voting X for overall CI
> result?
> >> Because
> >>     > > > > > > > most developers are used to ignore the PRs with 'X'.
> >>     > > > > > > > >
> >>     > > > > > > > > Thanks,
> >>     > > > > > > > > Zhennan
> >>     > > > > > > > >
> >>     > > > > > > > > -----Original Message-----
> >>     > > > > > > > > From: kellen sunderland [mailto:
> >> kellen.sunderland@gmail.com]
> >>     > > > > > > > > Sent: Friday, September 28, 2018 10:38 PM
> >>     > > > > > > > > To: dev@mxnet.incubator.apache.org
> >>     > > > > > > > > Subject: Re: Time out for Travis CI
> >>     > > > > > > > >
> >>     > > > > > > > > Hey Zhennan, you're safe to ignore Travis failures
> >> for now.
> >>     > > > > > > > > They're
> >>     > > > > > > > just informational.
> >>     > > > > > > > >
> >>     > > > > > > > > The reason you sometimes see quick builds and
> >> sometimes see
> >>     > > slow
> >>     > > > > > > > > builds
> >>     > > > > > > > is that we're making use of ccache in between builds.
> >> If your
> >>     > PR
> >>     > > > > > > > is similar to what's in master you should build very
> >> quickly,
> >>     > if
> >>     > > > > > > > not it's going to take a while and likely time out.
> If
> >> you see
> >>     > > > > > > > timeouts rebasing may speed things up.  Unfortunately
> >> the
> >>     > > timeouts
> >>     > > > > > > > are global and we're not able to increase them.  I'm
> >> hoping
> >>     > that
> >>     > > > > > > > adding artifact caching will speed up future builds to
> >> the
> >>     > point
> >>     > > > > > > > that test runs and builds can be executed in under the
> >> global
> >>     > > limit
> >>     > > > > (which is ~50 minutes).
> >>     > > > > > > > >
> >>     > > > > > > > > -Kellen
> >>     > > > > > > > >
> >>     > > > > > > > >
> >>     > > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan
> >>     > > > > > > > > <zh...@intel.com>
> >>     > > > > > > > wrote:
> >>     > > > > > > > >
> >>     > > > > > > > > > Hi MXNet devs,
> >>     > > > > > > > > >
> >>     > > > > > > > > > I'm struggled with new Travis CI for a while, it
> >> always run
> >>     > > > > > > > > > time out for this PR:
> >>     > > > > > > > > >
> >> https://github.com/apache/incubator-mxnet/pull/12530
> >>     > > > > > > > > >
> >>     > > > > > > > > > Most of the time, Jenkins CI can pass, while
> Travis
> >> can't
> >>     > be
> >>     > > > > > > > > > finished within 50 minutes. For this PR, it
> >> shouldn't
> >>     > affect
> >>     > > > > > > > > > much on the build time or unit test time. Also, I
> >> saw other
> >>     > > PR
> >>     > > > > has same problem, eg.
> >>     > > > > > > > > >
> >>     > > > > > > > > >
> >>     > > > > > > > > >
> >>     > > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?
> >>     > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
> >>     > > > > > > > > >
> >>     > > > > > > > > >
> >>     > > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?
> >>     > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
> >>     > > > > > > > > >
> >>     > > > > > > > > > According to the time stamp from Travis, all
> passed
> >> PR are
> >>     > > > > > > > > > within small code change, and can complete `make
> >> -j2`
> >>     > within
> >>     > > > > > > > > > 25s. But for timeout case, 'make -j2' will need
> >> about
> >>     > 1600s.
> >>     > > > > > > > > > Does Travis do incremental build for each test?
> >> Shall we
> >>     > > > > > > > > > increase time limit for large PR? Can we add more
> >> time
> >>     > stamp
> >>     > > > > > > > > > for build and unites stage to
> >>     > > > > > > > help understand what's going on there?
> >>     > > > > > > > > >
> >>     > > > > > > > > > Thanks in advance,
> >>     > > > > > > > > > Zhennan
> >>     > > > > > > > > >
> >>     > > > > > > >
> >>     > > > > > > >
> >>     > > > > > > >
> >>     > > > > > > > --
> >>     > > > > > > > Yizhi Liu
> >>     > > > > > > > DMLC member
> >>     > > > > > > > Amazon Web Services
> >>     > > > > > > > Vancouver, Canada
> >>     > > > > > > >
> >>     > > > > >
> >>     > > > > >
> >>     > > > > >
> >>     > > > > > --
> >>     > > > > > Yizhi Liu
> >>     > > > > > DMLC member
> >>     > > > > > Amazon Web Services
> >>     > > > > > Vancouver, Canada
> >>     > > > >
> >>     > > > >
> >>     > > > >
> >>     > > > > --
> >>     > > > > Yizhi Liu
> >>     > > > > DMLC member
> >>     > > > > Amazon Web Services
> >>     > > > > Vancouver, Canada
> >>     > > > >
> >>     > > >
> >>     > > --
> >>     > > Yizhi Liu
> >>     > > DMLC member
> >>     > > Amazon Web Services
> >>     > > Vancouver, Canada
> >>     > >
> >>     >
> >>
> >>
> >>
>

Re: Time out for Travis CI

Posted by kellen sunderland <ke...@gmail.com>.
Interesting, this page seems to indicate that private projects do have a
longer time out.  I'll drop Travis a quick email and see what the deal
would be for our project.
https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts.

On Tue, Oct 2, 2018, 3:15 AM kellen sunderland <ke...@gmail.com>
wrote:

> I actually thought we were already using a paid plan through Apache
> https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci
>
> On Tue, Oct 2, 2018, 3:11 AM Qing Lan <la...@live.com> wrote:
>
>> Are we currently on a free plan? If we are, probably the unlimited build
>> minutes would help
>>
>> Thanks,
>> Qing
>>
>> On 10/1/18, 6:08 PM, "kellen sunderland" <ke...@gmail.com>
>> wrote:
>>
>>     Does the global time out change for paid plans?  I looked into it
>> briefly
>>     but didn't see anything that would indicate it does.
>>
>>     On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy <
>> pedro.larroy.lists@gmail.com>
>>     wrote:
>>
>>     > I think there's two approaches that we can take to mitigate the
>> build &
>>     > test time problem, in one hand use a paid travis CI plan, in other
>> improve
>>     > the unit tests in suites and only run a core set of tests, as we
>> should do
>>     > on devices, but on this case we reduce coverage.
>>     >
>>     > https://travis-ci.com/plans
>>     >
>>     > Pedro.
>>     >
>>     > On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <ea...@gmail.com>
>> wrote:
>>     >
>>     > > This makes sense. Thanks
>>     > >
>>     > > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
>>     > > kellen.sunderland@gmail.com> wrote:
>>     > >
>>     > > > Hey Zhennan, yes this is the exact problem, and I agree with
>> your
>>     > points
>>     > > > completely.  This is why when we first added Travis we
>> attempted to
>>     > > > communicate that it would be informational only, and that we'd
>> need to
>>     > > > iterate on the config before it would be a test that people
>> should
>>     > > consider
>>     > > > 'required'.  Apologies, we should have been more
>> straightforward about
>>     > > > those tradeoffs.  The strong point in favour of adding Travis in
>>     > > > informational mode was that we had a serious MacOS specific bug
>> that we
>>     > > > wanted to verify was fixed.
>>     > > >
>>     > > > The good news is I've opened a PR which I hope will speed up
>> these
>>     > builds
>>     > > > to the point that they won't rely on caching.  Once it is
>> merged it
>>     > would
>>     > > > be very helpful if you could rebase on this PR and test to
>> ensure that
>>     > > > large changes no longer hit the global timeout without cache.
>>     > > > https://github.com/apache/incubator-mxnet/pull/12706
>>     > > >
>>     > > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <
>> zhennan.qin@intel.com>
>>     > > > wrote:
>>     > > >
>>     > > > > Hi YiZhi and Kellen,
>>     > > > >
>>     > > > > From my point of view, travis should be able to get passed
>> from a
>>     > > scratch
>>     > > > > build. Pending result on ccache hit/miss is not a good idea.
>> For this
>>     > > PR,
>>     > > > > as it changed many header file, lots of files need be
>> recompiled,
>>     > just
>>     > > > like
>>     > > > > a scratch build. I think that's the reason that travis
>> timeout. This
>>     > > > should
>>     > > > > be fixed before enabling travis, as it will block any change
>> to those
>>     > > > base
>>     > > > > header file. Again, it's not a special case with this PR
>> only, you
>>     > can
>>     > > > find
>>     > > > > same problem on other PRs:
>>     > > > >
>>     > > > >
>>     > > > >
>>     > > >
>>     > >
>>     >
>> https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
>>     > > > >
>>     > > > >
>>     > > >
>>     > >
>>     >
>> https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
>>     > > > >
>>     > > > >
>>     > > > > Thanks,
>>     > > > > Zhennan
>>     > > > >
>>     > > > > -----Original Message-----
>>     > > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
>>     > > > > Sent: Sunday, September 30, 2018 5:15 AM
>>     > > > > To: eazhi.liu@gmail.com
>>     > > > > Cc: dev@mxnet.incubator.apache.org
>>     > > > > Subject: Re: Time out for Travis CI
>>     > > > >
>>     > > > > while other PRs are all good.
>>     > > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <
>> eazhi.liu@gmail.com>
>>     > wrote:
>>     > > > > >
>>     > > > > > Honestly I don't know yet. I can help to investigate. Just
>> given
>>     > the
>>     > > > > > evidence that, travis timeout every time it gets
>> re-triggered - 2
>>     > > > > > times at least. Correct me if I'm wrong @ Zhennan On Sat,
>> Sep 29,
>>     > > 2018
>>     > > > > > at 1:54 PM kellen sunderland <ke...@gmail.com>
>> wrote:
>>     > > > > > >
>>     > > > > > > Reading over the PR I don't see what aspects would cause
>> extra
>>     > > > > > > runtime YiZhi, could you point them out?
>>     > > > > > >
>>     > > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <
>> eazhi.liu@gmail.com>
>>     > > > wrote:
>>     > > > > > >
>>     > > > > > > > Kellen, I think this PR introduces extra runtime in CI,
>> thus
>>     > > > > > > > causes the timeout. Which means, once merged, every PR
>> later
>>     > will
>>     > > > > > > > see same timeout in travis.
>>     > > > > > > >
>>     > > > > > > > So shall we modify the changes to decrease the test
>> running
>>     > time?
>>     > > > > > > > or just disable the Travis CI?
>>     > > > > > > >
>>     > > > > > > >
>>     > > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
>>     > > > > > > > <zh...@intel.com>
>>     > > > > > > > wrote:
>>     > > > > > > > >
>>     > > > > > > > > Hi Kellen,
>>     > > > > > > > >
>>     > > > > > > > > Thanks for your explanation. Do you have a time plan
>> to solve
>>     > > > > > > > > the
>>     > > > > > > > timeout issue? Rebasing can't work for my case. Or
>> shall we run
>>     > > it
>>     > > > > > > > silently to disallow it voting X for overall CI result?
>> Because
>>     > > > > > > > most developers are used to ignore the PRs with 'X'.
>>     > > > > > > > >
>>     > > > > > > > > Thanks,
>>     > > > > > > > > Zhennan
>>     > > > > > > > >
>>     > > > > > > > > -----Original Message-----
>>     > > > > > > > > From: kellen sunderland [mailto:
>> kellen.sunderland@gmail.com]
>>     > > > > > > > > Sent: Friday, September 28, 2018 10:38 PM
>>     > > > > > > > > To: dev@mxnet.incubator.apache.org
>>     > > > > > > > > Subject: Re: Time out for Travis CI
>>     > > > > > > > >
>>     > > > > > > > > Hey Zhennan, you're safe to ignore Travis failures
>> for now.
>>     > > > > > > > > They're
>>     > > > > > > > just informational.
>>     > > > > > > > >
>>     > > > > > > > > The reason you sometimes see quick builds and
>> sometimes see
>>     > > slow
>>     > > > > > > > > builds
>>     > > > > > > > is that we're making use of ccache in between builds.
>> If your
>>     > PR
>>     > > > > > > > is similar to what's in master you should build very
>> quickly,
>>     > if
>>     > > > > > > > not it's going to take a while and likely time out.  If
>> you see
>>     > > > > > > > timeouts rebasing may speed things up.  Unfortunately
>> the
>>     > > timeouts
>>     > > > > > > > are global and we're not able to increase them.  I'm
>> hoping
>>     > that
>>     > > > > > > > adding artifact caching will speed up future builds to
>> the
>>     > point
>>     > > > > > > > that test runs and builds can be executed in under the
>> global
>>     > > limit
>>     > > > > (which is ~50 minutes).
>>     > > > > > > > >
>>     > > > > > > > > -Kellen
>>     > > > > > > > >
>>     > > > > > > > >
>>     > > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan
>>     > > > > > > > > <zh...@intel.com>
>>     > > > > > > > wrote:
>>     > > > > > > > >
>>     > > > > > > > > > Hi MXNet devs,
>>     > > > > > > > > >
>>     > > > > > > > > > I'm struggled with new Travis CI for a while, it
>> always run
>>     > > > > > > > > > time out for this PR:
>>     > > > > > > > > >
>> https://github.com/apache/incubator-mxnet/pull/12530
>>     > > > > > > > > >
>>     > > > > > > > > > Most of the time, Jenkins CI can pass, while Travis
>> can't
>>     > be
>>     > > > > > > > > > finished within 50 minutes. For this PR, it
>> shouldn't
>>     > affect
>>     > > > > > > > > > much on the build time or unit test time. Also, I
>> saw other
>>     > > PR
>>     > > > > has same problem, eg.
>>     > > > > > > > > >
>>     > > > > > > > > >
>>     > > > > > > > > >
>>     > > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?
>>     > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
>>     > > > > > > > > >
>>     > > > > > > > > >
>>     > > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?
>>     > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
>>     > > > > > > > > >
>>     > > > > > > > > > According to the time stamp from Travis, all passed
>> PR are
>>     > > > > > > > > > within small code change, and can complete `make
>> -j2`
>>     > within
>>     > > > > > > > > > 25s. But for timeout case, 'make -j2' will need
>> about
>>     > 1600s.
>>     > > > > > > > > > Does Travis do incremental build for each test?
>> Shall we
>>     > > > > > > > > > increase time limit for large PR? Can we add more
>> time
>>     > stamp
>>     > > > > > > > > > for build and unites stage to
>>     > > > > > > > help understand what's going on there?
>>     > > > > > > > > >
>>     > > > > > > > > > Thanks in advance,
>>     > > > > > > > > > Zhennan
>>     > > > > > > > > >
>>     > > > > > > >
>>     > > > > > > >
>>     > > > > > > >
>>     > > > > > > > --
>>     > > > > > > > Yizhi Liu
>>     > > > > > > > DMLC member
>>     > > > > > > > Amazon Web Services
>>     > > > > > > > Vancouver, Canada
>>     > > > > > > >
>>     > > > > >
>>     > > > > >
>>     > > > > >
>>     > > > > > --
>>     > > > > > Yizhi Liu
>>     > > > > > DMLC member
>>     > > > > > Amazon Web Services
>>     > > > > > Vancouver, Canada
>>     > > > >
>>     > > > >
>>     > > > >
>>     > > > > --
>>     > > > > Yizhi Liu
>>     > > > > DMLC member
>>     > > > > Amazon Web Services
>>     > > > > Vancouver, Canada
>>     > > > >
>>     > > >
>>     > > --
>>     > > Yizhi Liu
>>     > > DMLC member
>>     > > Amazon Web Services
>>     > > Vancouver, Canada
>>     > >
>>     >
>>
>>
>>

Re: Time out for Travis CI

Posted by kellen sunderland <ke...@gmail.com>.
I actually thought we were already using a paid plan through Apache
https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci

On Tue, Oct 2, 2018, 3:11 AM Qing Lan <la...@live.com> wrote:

> Are we currently on a free plan? If we are, probably the unlimited build
> minutes would help
>
> Thanks,
> Qing
>
> On 10/1/18, 6:08 PM, "kellen sunderland" <ke...@gmail.com>
> wrote:
>
>     Does the global time out change for paid plans?  I looked into it
> briefly
>     but didn't see anything that would indicate it does.
>
>     On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com>
>     wrote:
>
>     > I think there's two approaches that we can take to mitigate the
> build &
>     > test time problem, in one hand use a paid travis CI plan, in other
> improve
>     > the unit tests in suites and only run a core set of tests, as we
> should do
>     > on devices, but on this case we reduce coverage.
>     >
>     > https://travis-ci.com/plans
>     >
>     > Pedro.
>     >
>     > On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <ea...@gmail.com>
> wrote:
>     >
>     > > This makes sense. Thanks
>     > >
>     > > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
>     > > kellen.sunderland@gmail.com> wrote:
>     > >
>     > > > Hey Zhennan, yes this is the exact problem, and I agree with your
>     > points
>     > > > completely.  This is why when we first added Travis we attempted
> to
>     > > > communicate that it would be informational only, and that we'd
> need to
>     > > > iterate on the config before it would be a test that people
> should
>     > > consider
>     > > > 'required'.  Apologies, we should have been more straightforward
> about
>     > > > those tradeoffs.  The strong point in favour of adding Travis in
>     > > > informational mode was that we had a serious MacOS specific bug
> that we
>     > > > wanted to verify was fixed.
>     > > >
>     > > > The good news is I've opened a PR which I hope will speed up
> these
>     > builds
>     > > > to the point that they won't rely on caching.  Once it is merged
> it
>     > would
>     > > > be very helpful if you could rebase on this PR and test to
> ensure that
>     > > > large changes no longer hit the global timeout without cache.
>     > > > https://github.com/apache/incubator-mxnet/pull/12706
>     > > >
>     > > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <
> zhennan.qin@intel.com>
>     > > > wrote:
>     > > >
>     > > > > Hi YiZhi and Kellen,
>     > > > >
>     > > > > From my point of view, travis should be able to get passed
> from a
>     > > scratch
>     > > > > build. Pending result on ccache hit/miss is not a good idea.
> For this
>     > > PR,
>     > > > > as it changed many header file, lots of files need be
> recompiled,
>     > just
>     > > > like
>     > > > > a scratch build. I think that's the reason that travis
> timeout. This
>     > > > should
>     > > > > be fixed before enabling travis, as it will block any change
> to those
>     > > > base
>     > > > > header file. Again, it's not a special case with this PR only,
> you
>     > can
>     > > > find
>     > > > > same problem on other PRs:
>     > > > >
>     > > > >
>     > > > >
>     > > >
>     > >
>     >
> https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
>     > > > >
>     > > > >
>     > > >
>     > >
>     >
> https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
>     > > > >
>     > > > >
>     > > > > Thanks,
>     > > > > Zhennan
>     > > > >
>     > > > > -----Original Message-----
>     > > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
>     > > > > Sent: Sunday, September 30, 2018 5:15 AM
>     > > > > To: eazhi.liu@gmail.com
>     > > > > Cc: dev@mxnet.incubator.apache.org
>     > > > > Subject: Re: Time out for Travis CI
>     > > > >
>     > > > > while other PRs are all good.
>     > > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <eazhi.liu@gmail.com
> >
>     > wrote:
>     > > > > >
>     > > > > > Honestly I don't know yet. I can help to investigate. Just
> given
>     > the
>     > > > > > evidence that, travis timeout every time it gets
> re-triggered - 2
>     > > > > > times at least. Correct me if I'm wrong @ Zhennan On Sat,
> Sep 29,
>     > > 2018
>     > > > > > at 1:54 PM kellen sunderland <ke...@gmail.com>
> wrote:
>     > > > > > >
>     > > > > > > Reading over the PR I don't see what aspects would cause
> extra
>     > > > > > > runtime YiZhi, could you point them out?
>     > > > > > >
>     > > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <
> eazhi.liu@gmail.com>
>     > > > wrote:
>     > > > > > >
>     > > > > > > > Kellen, I think this PR introduces extra runtime in CI,
> thus
>     > > > > > > > causes the timeout. Which means, once merged, every PR
> later
>     > will
>     > > > > > > > see same timeout in travis.
>     > > > > > > >
>     > > > > > > > So shall we modify the changes to decrease the test
> running
>     > time?
>     > > > > > > > or just disable the Travis CI?
>     > > > > > > >
>     > > > > > > >
>     > > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
>     > > > > > > > <zh...@intel.com>
>     > > > > > > > wrote:
>     > > > > > > > >
>     > > > > > > > > Hi Kellen,
>     > > > > > > > >
>     > > > > > > > > Thanks for your explanation. Do you have a time plan
> to solve
>     > > > > > > > > the
>     > > > > > > > timeout issue? Rebasing can't work for my case. Or shall
> we run
>     > > it
>     > > > > > > > silently to disallow it voting X for overall CI result?
> Because
>     > > > > > > > most developers are used to ignore the PRs with 'X'.
>     > > > > > > > >
>     > > > > > > > > Thanks,
>     > > > > > > > > Zhennan
>     > > > > > > > >
>     > > > > > > > > -----Original Message-----
>     > > > > > > > > From: kellen sunderland [mailto:
> kellen.sunderland@gmail.com]
>     > > > > > > > > Sent: Friday, September 28, 2018 10:38 PM
>     > > > > > > > > To: dev@mxnet.incubator.apache.org
>     > > > > > > > > Subject: Re: Time out for Travis CI
>     > > > > > > > >
>     > > > > > > > > Hey Zhennan, you're safe to ignore Travis failures for
> now.
>     > > > > > > > > They're
>     > > > > > > > just informational.
>     > > > > > > > >
>     > > > > > > > > The reason you sometimes see quick builds and
> sometimes see
>     > > slow
>     > > > > > > > > builds
>     > > > > > > > is that we're making use of ccache in between builds.
> If your
>     > PR
>     > > > > > > > is similar to what's in master you should build very
> quickly,
>     > if
>     > > > > > > > not it's going to take a while and likely time out.  If
> you see
>     > > > > > > > timeouts rebasing may speed things up.  Unfortunately the
>     > > timeouts
>     > > > > > > > are global and we're not able to increase them.  I'm
> hoping
>     > that
>     > > > > > > > adding artifact caching will speed up future builds to
> the
>     > point
>     > > > > > > > that test runs and builds can be executed in under the
> global
>     > > limit
>     > > > > (which is ~50 minutes).
>     > > > > > > > >
>     > > > > > > > > -Kellen
>     > > > > > > > >
>     > > > > > > > >
>     > > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan
>     > > > > > > > > <zh...@intel.com>
>     > > > > > > > wrote:
>     > > > > > > > >
>     > > > > > > > > > Hi MXNet devs,
>     > > > > > > > > >
>     > > > > > > > > > I'm struggled with new Travis CI for a while, it
> always run
>     > > > > > > > > > time out for this PR:
>     > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12530
>     > > > > > > > > >
>     > > > > > > > > > Most of the time, Jenkins CI can pass, while Travis
> can't
>     > be
>     > > > > > > > > > finished within 50 minutes. For this PR, it shouldn't
>     > affect
>     > > > > > > > > > much on the build time or unit test time. Also, I
> saw other
>     > > PR
>     > > > > has same problem, eg.
>     > > > > > > > > >
>     > > > > > > > > >
>     > > > > > > > > >
>     > > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?
>     > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
>     > > > > > > > > >
>     > > > > > > > > >
>     > > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?
>     > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
>     > > > > > > > > >
>     > > > > > > > > > According to the time stamp from Travis, all passed
> PR are
>     > > > > > > > > > within small code change, and can complete `make -j2`
>     > within
>     > > > > > > > > > 25s. But for timeout case, 'make -j2' will need about
>     > 1600s.
>     > > > > > > > > > Does Travis do incremental build for each test?
> Shall we
>     > > > > > > > > > increase time limit for large PR? Can we add more
> time
>     > stamp
>     > > > > > > > > > for build and unites stage to
>     > > > > > > > help understand what's going on there?
>     > > > > > > > > >
>     > > > > > > > > > Thanks in advance,
>     > > > > > > > > > Zhennan
>     > > > > > > > > >
>     > > > > > > >
>     > > > > > > >
>     > > > > > > >
>     > > > > > > > --
>     > > > > > > > Yizhi Liu
>     > > > > > > > DMLC member
>     > > > > > > > Amazon Web Services
>     > > > > > > > Vancouver, Canada
>     > > > > > > >
>     > > > > >
>     > > > > >
>     > > > > >
>     > > > > > --
>     > > > > > Yizhi Liu
>     > > > > > DMLC member
>     > > > > > Amazon Web Services
>     > > > > > Vancouver, Canada
>     > > > >
>     > > > >
>     > > > >
>     > > > > --
>     > > > > Yizhi Liu
>     > > > > DMLC member
>     > > > > Amazon Web Services
>     > > > > Vancouver, Canada
>     > > > >
>     > > >
>     > > --
>     > > Yizhi Liu
>     > > DMLC member
>     > > Amazon Web Services
>     > > Vancouver, Canada
>     > >
>     >
>
>
>

Re: Time out for Travis CI

Posted by Qing Lan <la...@live.com>.
Are we currently on a free plan? If we are, probably the unlimited build minutes would help

Thanks,
Qing

On 10/1/18, 6:08 PM, "kellen sunderland" <ke...@gmail.com> wrote:

    Does the global time out change for paid plans?  I looked into it briefly
    but didn't see anything that would indicate it does.
    
    On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy <pe...@gmail.com>
    wrote:
    
    > I think there's two approaches that we can take to mitigate the build &
    > test time problem, in one hand use a paid travis CI plan, in other improve
    > the unit tests in suites and only run a core set of tests, as we should do
    > on devices, but on this case we reduce coverage.
    >
    > https://travis-ci.com/plans
    >
    > Pedro.
    >
    > On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <ea...@gmail.com> wrote:
    >
    > > This makes sense. Thanks
    > >
    > > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
    > > kellen.sunderland@gmail.com> wrote:
    > >
    > > > Hey Zhennan, yes this is the exact problem, and I agree with your
    > points
    > > > completely.  This is why when we first added Travis we attempted to
    > > > communicate that it would be informational only, and that we'd need to
    > > > iterate on the config before it would be a test that people should
    > > consider
    > > > 'required'.  Apologies, we should have been more straightforward about
    > > > those tradeoffs.  The strong point in favour of adding Travis in
    > > > informational mode was that we had a serious MacOS specific bug that we
    > > > wanted to verify was fixed.
    > > >
    > > > The good news is I've opened a PR which I hope will speed up these
    > builds
    > > > to the point that they won't rely on caching.  Once it is merged it
    > would
    > > > be very helpful if you could rebase on this PR and test to ensure that
    > > > large changes no longer hit the global timeout without cache.
    > > > https://github.com/apache/incubator-mxnet/pull/12706
    > > >
    > > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <zh...@intel.com>
    > > > wrote:
    > > >
    > > > > Hi YiZhi and Kellen,
    > > > >
    > > > > From my point of view, travis should be able to get passed from a
    > > scratch
    > > > > build. Pending result on ccache hit/miss is not a good idea. For this
    > > PR,
    > > > > as it changed many header file, lots of files need be recompiled,
    > just
    > > > like
    > > > > a scratch build. I think that's the reason that travis timeout. This
    > > > should
    > > > > be fixed before enabling travis, as it will block any change to those
    > > > base
    > > > > header file. Again, it's not a special case with this PR only, you
    > can
    > > > find
    > > > > same problem on other PRs:
    > > > >
    > > > >
    > > > >
    > > >
    > >
    > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
    > > > >
    > > > >
    > > >
    > >
    > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
    > > > >
    > > > >
    > > > > Thanks,
    > > > > Zhennan
    > > > >
    > > > > -----Original Message-----
    > > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
    > > > > Sent: Sunday, September 30, 2018 5:15 AM
    > > > > To: eazhi.liu@gmail.com
    > > > > Cc: dev@mxnet.incubator.apache.org
    > > > > Subject: Re: Time out for Travis CI
    > > > >
    > > > > while other PRs are all good.
    > > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <ea...@gmail.com>
    > wrote:
    > > > > >
    > > > > > Honestly I don't know yet. I can help to investigate. Just given
    > the
    > > > > > evidence that, travis timeout every time it gets re-triggered - 2
    > > > > > times at least. Correct me if I'm wrong @ Zhennan On Sat, Sep 29,
    > > 2018
    > > > > > at 1:54 PM kellen sunderland <ke...@gmail.com> wrote:
    > > > > > >
    > > > > > > Reading over the PR I don't see what aspects would cause extra
    > > > > > > runtime YiZhi, could you point them out?
    > > > > > >
    > > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <ea...@gmail.com>
    > > > wrote:
    > > > > > >
    > > > > > > > Kellen, I think this PR introduces extra runtime in CI, thus
    > > > > > > > causes the timeout. Which means, once merged, every PR later
    > will
    > > > > > > > see same timeout in travis.
    > > > > > > >
    > > > > > > > So shall we modify the changes to decrease the test running
    > time?
    > > > > > > > or just disable the Travis CI?
    > > > > > > >
    > > > > > > >
    > > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
    > > > > > > > <zh...@intel.com>
    > > > > > > > wrote:
    > > > > > > > >
    > > > > > > > > Hi Kellen,
    > > > > > > > >
    > > > > > > > > Thanks for your explanation. Do you have a time plan to solve
    > > > > > > > > the
    > > > > > > > timeout issue? Rebasing can't work for my case. Or shall we run
    > > it
    > > > > > > > silently to disallow it voting X for overall CI result? Because
    > > > > > > > most developers are used to ignore the PRs with 'X'.
    > > > > > > > >
    > > > > > > > > Thanks,
    > > > > > > > > Zhennan
    > > > > > > > >
    > > > > > > > > -----Original Message-----
    > > > > > > > > From: kellen sunderland [mailto:kellen.sunderland@gmail.com]
    > > > > > > > > Sent: Friday, September 28, 2018 10:38 PM
    > > > > > > > > To: dev@mxnet.incubator.apache.org
    > > > > > > > > Subject: Re: Time out for Travis CI
    > > > > > > > >
    > > > > > > > > Hey Zhennan, you're safe to ignore Travis failures for now.
    > > > > > > > > They're
    > > > > > > > just informational.
    > > > > > > > >
    > > > > > > > > The reason you sometimes see quick builds and sometimes see
    > > slow
    > > > > > > > > builds
    > > > > > > > is that we're making use of ccache in between builds.  If your
    > PR
    > > > > > > > is similar to what's in master you should build very quickly,
    > if
    > > > > > > > not it's going to take a while and likely time out.  If you see
    > > > > > > > timeouts rebasing may speed things up.  Unfortunately the
    > > timeouts
    > > > > > > > are global and we're not able to increase them.  I'm hoping
    > that
    > > > > > > > adding artifact caching will speed up future builds to the
    > point
    > > > > > > > that test runs and builds can be executed in under the global
    > > limit
    > > > > (which is ~50 minutes).
    > > > > > > > >
    > > > > > > > > -Kellen
    > > > > > > > >
    > > > > > > > >
    > > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan
    > > > > > > > > <zh...@intel.com>
    > > > > > > > wrote:
    > > > > > > > >
    > > > > > > > > > Hi MXNet devs,
    > > > > > > > > >
    > > > > > > > > > I'm struggled with new Travis CI for a while, it always run
    > > > > > > > > > time out for this PR:
    > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12530
    > > > > > > > > >
    > > > > > > > > > Most of the time, Jenkins CI can pass, while Travis can't
    > be
    > > > > > > > > > finished within 50 minutes. For this PR, it shouldn't
    > affect
    > > > > > > > > > much on the build time or unit test time. Also, I saw other
    > > PR
    > > > > has same problem, eg.
    > > > > > > > > >
    > > > > > > > > >
    > > > > > > > > >
    > > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?
    > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
    > > > > > > > > >
    > > > > > > > > >
    > > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?
    > > > > > > > > > utm_sour ce=github_status&utm_medium=notification
    > > > > > > > > >
    > > > > > > > > > According to the time stamp from Travis, all passed PR are
    > > > > > > > > > within small code change, and can complete `make -j2`
    > within
    > > > > > > > > > 25s. But for timeout case, 'make -j2' will need about
    > 1600s.
    > > > > > > > > > Does Travis do incremental build for each test? Shall we
    > > > > > > > > > increase time limit for large PR? Can we add more time
    > stamp
    > > > > > > > > > for build and unites stage to
    > > > > > > > help understand what's going on there?
    > > > > > > > > >
    > > > > > > > > > Thanks in advance,
    > > > > > > > > > Zhennan
    > > > > > > > > >
    > > > > > > >
    > > > > > > >
    > > > > > > >
    > > > > > > > --
    > > > > > > > Yizhi Liu
    > > > > > > > DMLC member
    > > > > > > > Amazon Web Services
    > > > > > > > Vancouver, Canada
    > > > > > > >
    > > > > >
    > > > > >
    > > > > >
    > > > > > --
    > > > > > Yizhi Liu
    > > > > > DMLC member
    > > > > > Amazon Web Services
    > > > > > Vancouver, Canada
    > > > >
    > > > >
    > > > >
    > > > > --
    > > > > Yizhi Liu
    > > > > DMLC member
    > > > > Amazon Web Services
    > > > > Vancouver, Canada
    > > > >
    > > >
    > > --
    > > Yizhi Liu
    > > DMLC member
    > > Amazon Web Services
    > > Vancouver, Canada
    > >
    >
    


Re: Time out for Travis CI

Posted by kellen sunderland <ke...@gmail.com>.
Does the global time out change for paid plans?  I looked into it briefly
but didn't see anything that would indicate it does.

On Tue, Oct 2, 2018, 2:25 AM Pedro Larroy <pe...@gmail.com>
wrote:

> I think there's two approaches that we can take to mitigate the build &
> test time problem, in one hand use a paid travis CI plan, in other improve
> the unit tests in suites and only run a core set of tests, as we should do
> on devices, but on this case we reduce coverage.
>
> https://travis-ci.com/plans
>
> Pedro.
>
> On Sat, Sep 29, 2018 at 6:53 PM YiZhi Liu <ea...@gmail.com> wrote:
>
> > This makes sense. Thanks
> >
> > On Sat, Sep 29, 2018 at 6:36 PM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > Hey Zhennan, yes this is the exact problem, and I agree with your
> points
> > > completely.  This is why when we first added Travis we attempted to
> > > communicate that it would be informational only, and that we'd need to
> > > iterate on the config before it would be a test that people should
> > consider
> > > 'required'.  Apologies, we should have been more straightforward about
> > > those tradeoffs.  The strong point in favour of adding Travis in
> > > informational mode was that we had a serious MacOS specific bug that we
> > > wanted to verify was fixed.
> > >
> > > The good news is I've opened a PR which I hope will speed up these
> builds
> > > to the point that they won't rely on caching.  Once it is merged it
> would
> > > be very helpful if you could rebase on this PR and test to ensure that
> > > large changes no longer hit the global timeout without cache.
> > > https://github.com/apache/incubator-mxnet/pull/12706
> > >
> > > On Sun, Sep 30, 2018 at 2:48 AM Qin, Zhennan <zh...@intel.com>
> > > wrote:
> > >
> > > > Hi YiZhi and Kellen,
> > > >
> > > > From my point of view, travis should be able to get passed from a
> > scratch
> > > > build. Pending result on ccache hit/miss is not a good idea. For this
> > PR,
> > > > as it changed many header file, lots of files need be recompiled,
> just
> > > like
> > > > a scratch build. I think that's the reason that travis timeout. This
> > > should
> > > > be fixed before enabling travis, as it will block any change to those
> > > base
> > > > header file. Again, it's not a special case with this PR only, you
> can
> > > find
> > > > same problem on other PRs:
> > > >
> > > >
> > > >
> > >
> >
> https://travis-ci.org/apache/incubator-mxnet/builds/433172088?utm_source=github_status&utm_medium=notification
> > > >
> > > >
> > >
> >
> https://travis-ci.org/apache/incubator-mxnet/builds/434404305?utm_source=github_status&utm_medium=notification
> > > >
> > > >
> > > > Thanks,
> > > > Zhennan
> > > >
> > > > -----Original Message-----
> > > > From: YiZhi Liu [mailto:eazhi.liu@gmail.com]
> > > > Sent: Sunday, September 30, 2018 5:15 AM
> > > > To: eazhi.liu@gmail.com
> > > > Cc: dev@mxnet.incubator.apache.org
> > > > Subject: Re: Time out for Travis CI
> > > >
> > > > while other PRs are all good.
> > > > On Sat, Sep 29, 2018 at 2:13 PM YiZhi Liu <ea...@gmail.com>
> wrote:
> > > > >
> > > > > Honestly I don't know yet. I can help to investigate. Just given
> the
> > > > > evidence that, travis timeout every time it gets re-triggered - 2
> > > > > times at least. Correct me if I'm wrong @ Zhennan On Sat, Sep 29,
> > 2018
> > > > > at 1:54 PM kellen sunderland <ke...@gmail.com> wrote:
> > > > > >
> > > > > > Reading over the PR I don't see what aspects would cause extra
> > > > > > runtime YiZhi, could you point them out?
> > > > > >
> > > > > > On Sat, Sep 29, 2018 at 8:46 PM YiZhi Liu <ea...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Kellen, I think this PR introduces extra runtime in CI, thus
> > > > > > > causes the timeout. Which means, once merged, every PR later
> will
> > > > > > > see same timeout in travis.
> > > > > > >
> > > > > > > So shall we modify the changes to decrease the test running
> time?
> > > > > > > or just disable the Travis CI?
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Sep 28, 2018 at 9:17 PM Qin, Zhennan
> > > > > > > <zh...@intel.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Kellen,
> > > > > > > >
> > > > > > > > Thanks for your explanation. Do you have a time plan to solve
> > > > > > > > the
> > > > > > > timeout issue? Rebasing can't work for my case. Or shall we run
> > it
> > > > > > > silently to disallow it voting X for overall CI result? Because
> > > > > > > most developers are used to ignore the PRs with 'X'.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Zhennan
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: kellen sunderland [mailto:kellen.sunderland@gmail.com]
> > > > > > > > Sent: Friday, September 28, 2018 10:38 PM
> > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > Subject: Re: Time out for Travis CI
> > > > > > > >
> > > > > > > > Hey Zhennan, you're safe to ignore Travis failures for now.
> > > > > > > > They're
> > > > > > > just informational.
> > > > > > > >
> > > > > > > > The reason you sometimes see quick builds and sometimes see
> > slow
> > > > > > > > builds
> > > > > > > is that we're making use of ccache in between builds.  If your
> PR
> > > > > > > is similar to what's in master you should build very quickly,
> if
> > > > > > > not it's going to take a while and likely time out.  If you see
> > > > > > > timeouts rebasing may speed things up.  Unfortunately the
> > timeouts
> > > > > > > are global and we're not able to increase them.  I'm hoping
> that
> > > > > > > adding artifact caching will speed up future builds to the
> point
> > > > > > > that test runs and builds can be executed in under the global
> > limit
> > > > (which is ~50 minutes).
> > > > > > > >
> > > > > > > > -Kellen
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Sep 28, 2018 at 4:05 PM Qin, Zhennan
> > > > > > > > <zh...@intel.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi MXNet devs,
> > > > > > > > >
> > > > > > > > > I'm struggled with new Travis CI for a while, it always run
> > > > > > > > > time out for this PR:
> > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12530
> > > > > > > > >
> > > > > > > > > Most of the time, Jenkins CI can pass, while Travis can't
> be
> > > > > > > > > finished within 50 minutes. For this PR, it shouldn't
> affect
> > > > > > > > > much on the build time or unit test time. Also, I saw other
> > PR
> > > > has same problem, eg.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > https://travis-ci.org/apache/incubator-mxnet/builds/433172088?
> > > > > > > > > utm_sour ce=github_status&utm_medium=notification
> > > > > > > > >
> > > > > > > > >
> > https://travis-ci.org/apache/incubator-mxnet/builds/434404305?
> > > > > > > > > utm_sour ce=github_status&utm_medium=notification
> > > > > > > > >
> > > > > > > > > According to the time stamp from Travis, all passed PR are
> > > > > > > > > within small code change, and can complete `make -j2`
> within
> > > > > > > > > 25s. But for timeout case, 'make -j2' will need about
> 1600s.
> > > > > > > > > Does Travis do incremental build for each test? Shall we
> > > > > > > > > increase time limit for large PR? Can we add more time
> stamp
> > > > > > > > > for build and unites stage to
> > > > > > > help understand what's going on there?
> > > > > > > > >
> > > > > > > > > Thanks in advance,
> > > > > > > > > Zhennan
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Yizhi Liu
> > > > > > > DMLC member
> > > > > > > Amazon Web Services
> > > > > > > Vancouver, Canada
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Yizhi Liu
> > > > > DMLC member
> > > > > Amazon Web Services
> > > > > Vancouver, Canada
> > > >
> > > >
> > > >
> > > > --
> > > > Yizhi Liu
> > > > DMLC member
> > > > Amazon Web Services
> > > > Vancouver, Canada
> > > >
> > >
> > --
> > Yizhi Liu
> > DMLC member
> > Amazon Web Services
> > Vancouver, Canada
> >
>