You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Frederick R Reiss <fr...@us.ibm.com> on 2016/12/09 00:49:19 UTC

Re: test suite running slowly after disable cache/sparse commit?

+dev list

I personally don't mind letting the regression suite run overnight. The
important thing is that we do not push changes that have not passed the
full automated test suite. In the interest of efficiency, we shouldn't even
be reviewing most PRs until after they have passed the automated tests.

Deron, are you seeing a backlog of not-yet-started builds queueing up on
the PR build server? If the queue is getting long, we can add additional
machines to the Jenkins cluster.

Fred



From:	Deron Eriksson/San Francisco/IBM
To:	Niketan Pansare/Almaden/IBM@IBMUS
Cc:	Berthold Reinwald/Almaden/IBM@IBMUS, Frederick R
            Reiss/Almaden/IBM@IBMUS
Date:	12/08/2016 11:06 AM
Subject:	Re: test suite running slowly after disable cache/sparse
            commit?



Hi Niketan,

Perhaps Berthold or Fred could add a little guidance here in terms of what
is acceptable? Having the test suite go from 2:21 to 3:41 (one pull request
yesterday took 4:11 to complete -
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/909/)
is very serious to me. Even if the test suite runs at 3:00, this is a
serious slowdown. It slows down our ability to validate pull requests and
other code on jenkins.

Deron


 ----- Original message -----
 From: Niketan Pansare/Almaden/IBM
 To: Deron Eriksson/San Francisco/IBM@ibmus
 Cc: Berthold Reinwald/Almaden/IBM@ibmus, Frederick R
 Reiss/Almaden/IBM@ibmus
 Subject: Re: test suite running slowly after disable cache/sparse commit?
 Date: Thu, Dec 8, 2016 8:55 AM

 Hi Deron,

 The commit replicated application tests for disable sparse and disable
 caching. So, the test time should increase. We should increase the
 duration or reduce the number of application tests we want to test with
 caching and sparse disabled.

 Thanks

 Niketan

 On Dec 8, 2016, at 7:47 AM, Deron Eriksson <de...@us.ibm.com> wrote:

       Hi Niketan,

       I noticed the daily test yesterday timed out, probably because of a
       long-running test.

       Looking at the commits from the day before (
       https://github.com/apache/incubator-systemml/commits/master), I
       noticed that [SYSTEMML-769] [SYSTEMML-1140] Removed -disable-caching
       and -disable-… (
       https://github.com/apache/incubator-systemml/commit/caaaec90b61e529e50021d89f9f108230fa307a8
       ) updated some of the tests.

       So I ran the tests on the previous commit (
       https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/) and
       the tests ran in 2hr 21min.

       I ran the tests on the 'disable caching...' commit (
       https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/) and
       the tests ran in 3hr 41min.

       One thing that is confusing to me is that the nightly test just
       completed successfully (
       https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/) in
       2hr 57min and did not time out like yesterday afternoon. So it is
       always possible it could be a server issue.

       Could you look into this and see if that commit introduced an issue
       with the tests?

       Thanks!
       Deron





Re: test suite running slowly after disable cache/sparse commit?

Posted by Deron Eriksson <de...@gmail.com>.
Hi,

It looks like we had another timeout on the daily build:
https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/677/console

Deron


On Thu, Dec 8, 2016 at 9:59 PM, Acs S <ac...@yahoo.com.invalid> wrote:

> +1 On adding Jenkins Build machines on PR builds.
> Couple of times I hit waiting PR builds due to queue. If that is not
> common, we can wait.
> -Arvind      From: Deron Eriksson <de...@gmail.com>
>  To: dev@systemml.incubator.apache.org
>  Sent: Friday, December 9, 2016 7:34 AM
>  Subject: Re: test suite running slowly after disable cache/sparse commit?
>
> Hi Fred,
>
> The last two daily tests ran around ~2:56 hr, so if this number is stable,
> it seems that the new tests potentially add about half an hour to the test
> suite time. I would like if we could decrease the test suite time rather
> than add significantly to it. In fact, personally I'd prefer if we could do
> something like move the time-consuming algorithm-type tests out of the main
> test suite and just run the algorithm tests daily (if this is technically
> possible). That way, we could get the main test suite time to be sped up
> significantly but still benefit from daily test coverage provided by the
> algorithm tests. I like the idea of a short test suite time since that
> makes it easier to get feedback and continue working on an issue that day.
> If the tests take too long to run, it means that issues that could
> potentially be solved in one day will get pushed out to another day.
>
> Increasing the number of simultaneous Jenkins jobs allowed could help with
> queued-up builds, which would be nice. Currently Jenkins runs a max of two
> simultaneous jobs. Jenkins currently handles:
> 1) two daily builds (at noon and at midnight)
> 2) on-demand builds (so a developer can commit some code on a branch and
> then have jenkins build/test so that a developer's machine isn't tied up)
> 3) pull request builds (the initial push with a PR will trigger this along
> with any subsequent pushes to the branch referenced by the PR).
>
> Today there is not a queue, but I'm the only person to trigger a PR build
> today. If more than two developers are submitting PRs that day, there will
> be a queue. This queue has been manageable, but if the increase in test
> suite time is a permanent thing, I'd recommend bumping the simultaneous
> Jenkins jobs from two to four.
>
> Deron
>
>
>
> On Thu, Dec 8, 2016 at 4:49 PM, Frederick R Reiss <fr...@us.ibm.com>
> wrote:
>
> > +dev list
> >
> > I personally don't mind letting the regression suite run overnight. The
> > important thing is that we do not push changes that have not passed the
> > full automated test suite. In the interest of efficiency, we shouldn't
> even
> > be reviewing most PRs until after they have passed the automated tests.
> >
> > Deron, are you seeing a backlog of not-yet-started builds queueing up on
> > the PR build server? If the queue is getting long, we can add additional
> > machines to the Jenkins cluster.
> >
> > Fred
> >
> > [image: Inactive hide details for Deron Eriksson---12/08/2016 11:06:52
> > AM---Hi Niketan,]Deron Eriksson---12/08/2016 11:06:52 AM---Hi Niketan,
> >
> > From: Deron Eriksson/San Francisco/IBM
> > To: Niketan Pansare/Almaden/IBM@IBMUS
> > Cc: Berthold Reinwald/Almaden/IBM@IBMUS, Frederick R
> > Reiss/Almaden/IBM@IBMUS
> > Date: 12/08/2016 11:06 AM
> > Subject: Re: test suite running slowly after disable cache/sparse commit?
> > ------------------------------
> >
> >
> >
> > Hi Niketan,
> >
> > Perhaps Berthold or Fred could add a little guidance here in terms of
> what
> > is acceptable? Having the test suite go from 2:21 to 3:41 (one pull
> request
> > yesterday took 4:11 to complete -
> > *https://sparktc.ibmcloud.com/jenkins/job/SystemML-
> PullRequestBuilder/909/*
> > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-
> PullRequestBuilder/909/>)
> > is very serious to me. Even if the test suite runs at 3:00, this is a
> > serious slowdown. It slows down our ability to validate pull requests and
> > other code on jenkins.
> >
> > Deron
> >
> >
> > ----- Original message -----
> > From: Niketan Pansare/Almaden/IBM
> > To: Deron Eriksson/San Francisco/IBM@ibmus
> > Cc: Berthold Reinwald/Almaden/IBM@ibmus, Frederick R
> > Reiss/Almaden/IBM@ibmus
> > Subject: Re: test suite running slowly after disable cache/sparse commit?
> > Date: Thu, Dec 8, 2016 8:55 AM
> >
> > Hi Deron,
> >
> > The commit replicated application tests for disable sparse and disable
> > caching. So, the test time should increase. We should increase the
> duration
> > or reduce the number of application tests we want to test with caching
> and
> > sparse disabled.
> >
> > Thanks
> >
> > Niketan
> >
> > On Dec 8, 2016, at 7:47 AM, Deron Eriksson <*deron@us.ibm.com*
> > <de...@us.ibm.com>> wrote:
> >
> >    Hi Niketan,
> >
> >      I noticed the daily test yesterday timed out, probably because of a
> >      long-running test.
> >
> >      Looking at the commits from the day before (
> >      *https://github.com/apache/incubator-systemml/commits/master*
> >      <https://github.com/apache/incubator-systemml/commits/master>), I
> >      noticed that [SYSTEMML-769] [SYSTEMML-1140] Removed
> -disable-caching and
> >      -disable-… (
> >      *https://github.com/apache/incubator-systemml/commit/
> caaaec90b61e529e50021d89f9f108230fa307a8*
> >      <https://github.com/apache/incubator-systemml/commit/
> caaaec90b61e529e50021d89f9f108230fa307a8>)
> >      updated some of the tests.
> >
> >      So I ran the tests on the previous commit (
> >      *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/*
> >      <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/>)
> >      and the tests ran in 2hr 21min.
> >
> >      I ran the tests on the 'disable caching...' commit (
> >      *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/*
> >      <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/>)
> >      and the tests ran in 3hr 41min.
> >
> >      One thing that is confusing to me is that the nightly test just
> >      completed successfully (
> >      *https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/*
> >      <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/>)
> >      in 2hr 57min and did not time out like yesterday afternoon. So it
> is always
> >      possible it could be a server issue.
> >
> >      Could you look into this and see if that commit introduced an issue
> >      with the tests?
> >
> >      Thanks!
> >      Deron
> >
> >
> >
> >
> >
> >
>
>
> --
> Deron Eriksson
> Spark Technology Center
> http://www.spark.tc/
>
>
>



-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/

Re: test suite running slowly after disable cache/sparse commit?

Posted by Acs S <ac...@yahoo.com.INVALID>.
+1 On adding Jenkins Build machines on PR builds.
Couple of times I hit waiting PR builds due to queue. If that is not common, we can wait.
-Arvind      From: Deron Eriksson <de...@gmail.com>
 To: dev@systemml.incubator.apache.org 
 Sent: Friday, December 9, 2016 7:34 AM
 Subject: Re: test suite running slowly after disable cache/sparse commit?
   
Hi Fred,

The last two daily tests ran around ~2:56 hr, so if this number is stable,
it seems that the new tests potentially add about half an hour to the test
suite time. I would like if we could decrease the test suite time rather
than add significantly to it. In fact, personally I'd prefer if we could do
something like move the time-consuming algorithm-type tests out of the main
test suite and just run the algorithm tests daily (if this is technically
possible). That way, we could get the main test suite time to be sped up
significantly but still benefit from daily test coverage provided by the
algorithm tests. I like the idea of a short test suite time since that
makes it easier to get feedback and continue working on an issue that day.
If the tests take too long to run, it means that issues that could
potentially be solved in one day will get pushed out to another day.

Increasing the number of simultaneous Jenkins jobs allowed could help with
queued-up builds, which would be nice. Currently Jenkins runs a max of two
simultaneous jobs. Jenkins currently handles:
1) two daily builds (at noon and at midnight)
2) on-demand builds (so a developer can commit some code on a branch and
then have jenkins build/test so that a developer's machine isn't tied up)
3) pull request builds (the initial push with a PR will trigger this along
with any subsequent pushes to the branch referenced by the PR).

Today there is not a queue, but I'm the only person to trigger a PR build
today. If more than two developers are submitting PRs that day, there will
be a queue. This queue has been manageable, but if the increase in test
suite time is a permanent thing, I'd recommend bumping the simultaneous
Jenkins jobs from two to four.

Deron



On Thu, Dec 8, 2016 at 4:49 PM, Frederick R Reiss <fr...@us.ibm.com>
wrote:

> +dev list
>
> I personally don't mind letting the regression suite run overnight. The
> important thing is that we do not push changes that have not passed the
> full automated test suite. In the interest of efficiency, we shouldn't even
> be reviewing most PRs until after they have passed the automated tests.
>
> Deron, are you seeing a backlog of not-yet-started builds queueing up on
> the PR build server? If the queue is getting long, we can add additional
> machines to the Jenkins cluster.
>
> Fred
>
> [image: Inactive hide details for Deron Eriksson---12/08/2016 11:06:52
> AM---Hi Niketan,]Deron Eriksson---12/08/2016 11:06:52 AM---Hi Niketan,
>
> From: Deron Eriksson/San Francisco/IBM
> To: Niketan Pansare/Almaden/IBM@IBMUS
> Cc: Berthold Reinwald/Almaden/IBM@IBMUS, Frederick R
> Reiss/Almaden/IBM@IBMUS
> Date: 12/08/2016 11:06 AM
> Subject: Re: test suite running slowly after disable cache/sparse commit?
> ------------------------------
>
>
>
> Hi Niketan,
>
> Perhaps Berthold or Fred could add a little guidance here in terms of what
> is acceptable? Having the test suite go from 2:21 to 3:41 (one pull request
> yesterday took 4:11 to complete -
> *https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/909/*
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/909/>)
> is very serious to me. Even if the test suite runs at 3:00, this is a
> serious slowdown. It slows down our ability to validate pull requests and
> other code on jenkins.
>
> Deron
>
>
> ----- Original message -----
> From: Niketan Pansare/Almaden/IBM
> To: Deron Eriksson/San Francisco/IBM@ibmus
> Cc: Berthold Reinwald/Almaden/IBM@ibmus, Frederick R
> Reiss/Almaden/IBM@ibmus
> Subject: Re: test suite running slowly after disable cache/sparse commit?
> Date: Thu, Dec 8, 2016 8:55 AM
>
> Hi Deron,
>
> The commit replicated application tests for disable sparse and disable
> caching. So, the test time should increase. We should increase the duration
> or reduce the number of application tests we want to test with caching and
> sparse disabled.
>
> Thanks
>
> Niketan
>
> On Dec 8, 2016, at 7:47 AM, Deron Eriksson <*deron@us.ibm.com*
> <de...@us.ibm.com>> wrote:
>
>    Hi Niketan,
>
>      I noticed the daily test yesterday timed out, probably because of a
>      long-running test.
>
>      Looking at the commits from the day before (
>      *https://github.com/apache/incubator-systemml/commits/master*
>      <https://github.com/apache/incubator-systemml/commits/master>), I
>      noticed that [SYSTEMML-769] [SYSTEMML-1140] Removed -disable-caching and
>      -disable-… (
>      *https://github.com/apache/incubator-systemml/commit/caaaec90b61e529e50021d89f9f108230fa307a8*
>      <https://github.com/apache/incubator-systemml/commit/caaaec90b61e529e50021d89f9f108230fa307a8>)
>      updated some of the tests.
>
>      So I ran the tests on the previous commit (
>      *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/*
>      <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/>)
>      and the tests ran in 2hr 21min.
>
>      I ran the tests on the 'disable caching...' commit (
>      *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/*
>      <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/>)
>      and the tests ran in 3hr 41min.
>
>      One thing that is confusing to me is that the nightly test just
>      completed successfully (
>      *https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/*
>      <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/>)
>      in 2hr 57min and did not time out like yesterday afternoon. So it is always
>      possible it could be a server issue.
>
>      Could you look into this and see if that commit introduced an issue
>      with the tests?
>
>      Thanks!
>      Deron
>
>
>
>
>
>


-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/

   

Re: test suite running slowly after disable cache/sparse commit?

Posted by Deron Eriksson <de...@gmail.com>.
Hi Fred,

The last two daily tests ran around ~2:56 hr, so if this number is stable,
it seems that the new tests potentially add about half an hour to the test
suite time. I would like if we could decrease the test suite time rather
than add significantly to it. In fact, personally I'd prefer if we could do
something like move the time-consuming algorithm-type tests out of the main
test suite and just run the algorithm tests daily (if this is technically
possible). That way, we could get the main test suite time to be sped up
significantly but still benefit from daily test coverage provided by the
algorithm tests. I like the idea of a short test suite time since that
makes it easier to get feedback and continue working on an issue that day.
If the tests take too long to run, it means that issues that could
potentially be solved in one day will get pushed out to another day.

Increasing the number of simultaneous Jenkins jobs allowed could help with
queued-up builds, which would be nice. Currently Jenkins runs a max of two
simultaneous jobs. Jenkins currently handles:
1) two daily builds (at noon and at midnight)
2) on-demand builds (so a developer can commit some code on a branch and
then have jenkins build/test so that a developer's machine isn't tied up)
3) pull request builds (the initial push with a PR will trigger this along
with any subsequent pushes to the branch referenced by the PR).

Today there is not a queue, but I'm the only person to trigger a PR build
today. If more than two developers are submitting PRs that day, there will
be a queue. This queue has been manageable, but if the increase in test
suite time is a permanent thing, I'd recommend bumping the simultaneous
Jenkins jobs from two to four.

Deron



On Thu, Dec 8, 2016 at 4:49 PM, Frederick R Reiss <fr...@us.ibm.com>
wrote:

> +dev list
>
> I personally don't mind letting the regression suite run overnight. The
> important thing is that we do not push changes that have not passed the
> full automated test suite. In the interest of efficiency, we shouldn't even
> be reviewing most PRs until after they have passed the automated tests.
>
> Deron, are you seeing a backlog of not-yet-started builds queueing up on
> the PR build server? If the queue is getting long, we can add additional
> machines to the Jenkins cluster.
>
> Fred
>
> [image: Inactive hide details for Deron Eriksson---12/08/2016 11:06:52
> AM---Hi Niketan,]Deron Eriksson---12/08/2016 11:06:52 AM---Hi Niketan,
>
> From: Deron Eriksson/San Francisco/IBM
> To: Niketan Pansare/Almaden/IBM@IBMUS
> Cc: Berthold Reinwald/Almaden/IBM@IBMUS, Frederick R
> Reiss/Almaden/IBM@IBMUS
> Date: 12/08/2016 11:06 AM
> Subject: Re: test suite running slowly after disable cache/sparse commit?
> ------------------------------
>
>
>
> Hi Niketan,
>
> Perhaps Berthold or Fred could add a little guidance here in terms of what
> is acceptable? Having the test suite go from 2:21 to 3:41 (one pull request
> yesterday took 4:11 to complete -
> *https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/909/*
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/909/>)
> is very serious to me. Even if the test suite runs at 3:00, this is a
> serious slowdown. It slows down our ability to validate pull requests and
> other code on jenkins.
>
> Deron
>
>
> ----- Original message -----
> From: Niketan Pansare/Almaden/IBM
> To: Deron Eriksson/San Francisco/IBM@ibmus
> Cc: Berthold Reinwald/Almaden/IBM@ibmus, Frederick R
> Reiss/Almaden/IBM@ibmus
> Subject: Re: test suite running slowly after disable cache/sparse commit?
> Date: Thu, Dec 8, 2016 8:55 AM
>
> Hi Deron,
>
> The commit replicated application tests for disable sparse and disable
> caching. So, the test time should increase. We should increase the duration
> or reduce the number of application tests we want to test with caching and
> sparse disabled.
>
> Thanks
>
> Niketan
>
> On Dec 8, 2016, at 7:47 AM, Deron Eriksson <*deron@us.ibm.com*
> <de...@us.ibm.com>> wrote:
>
>    Hi Niketan,
>
>       I noticed the daily test yesterday timed out, probably because of a
>       long-running test.
>
>       Looking at the commits from the day before (
>       *https://github.com/apache/incubator-systemml/commits/master*
>       <https://github.com/apache/incubator-systemml/commits/master>), I
>       noticed that [SYSTEMML-769] [SYSTEMML-1140] Removed -disable-caching and
>       -disable-… (
>       *https://github.com/apache/incubator-systemml/commit/caaaec90b61e529e50021d89f9f108230fa307a8*
>       <https://github.com/apache/incubator-systemml/commit/caaaec90b61e529e50021d89f9f108230fa307a8>)
>       updated some of the tests.
>
>       So I ran the tests on the previous commit (
>       *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/*
>       <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/>)
>       and the tests ran in 2hr 21min.
>
>       I ran the tests on the 'disable caching...' commit (
>       *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/*
>       <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/>)
>       and the tests ran in 3hr 41min.
>
>       One thing that is confusing to me is that the nightly test just
>       completed successfully (
>       *https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/*
>       <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/>)
>       in 2hr 57min and did not time out like yesterday afternoon. So it is always
>       possible it could be a server issue.
>
>       Could you look into this and see if that commit introduced an issue
>       with the tests?
>
>       Thanks!
>       Deron
>
>
>
>
>
>


-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/