You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Patrick Wendell <pw...@gmail.com> on 2014/08/15 18:04:26 UTC

Tests failing

Hi All,

I noticed that all PR tests run overnight had failed due to timeouts. The
patch that updates the netty shuffle I believe somehow inflated to the
build time significantly. That patch had been tested, but one change was
made before it was merged that was not tested.

I've reverted the patch for now to see if it brings the build times back
down.

- Patrick

Re: Tests failing

Posted by Nicholas Chammas <ni...@gmail.com>.
*Bam. <https://github.com/apache/spark/pull/1974#issuecomment-52368527>*


On Fri, Aug 15, 2014 at 5:04 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Yeah I was thinking something like that. Basically we should just have a
> variable for the timeout and I can make sure it's under the configured
> Jenkins time.
>
>
> On Fri, Aug 15, 2014 at 1:55 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> So 2 hours is a hard cap on how long a build can run. Okie doke.
>>
>> Perhaps then I'll wrap the run-tests step as you suggest and limit it to
>> 100 minutes or something, and cleanly report if it times out.
>>
>> Sound good?
>>
>>
>> On Fri, Aug 15, 2014 at 4:43 PM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>>
>>> Hey Nicholas,
>>>
>>> Yeah so Jenkins has it's own timeout mechanism and it will just kill the
>>> entire build after 120 minutes. But since run-tests is sitting in the
>>> middle of the tests, it can't actually post a failure message.
>>>
>>> I think run-tests-jenkins should just wrap the call to run-tests in a
>>> call in its own timeout. It might be possible to just use this:
>>>
>>> http://linux.die.net/man/1/timeout
>>>
>>> - Patrick
>>>
>>>
>>> On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> OK, I've captured this in SPARK-3076
>>>> <https://issues.apache.org/jira/browse/SPARK-3076>.
>>>>
>>>> Patrick,
>>>>
>>>> Is the problem that this run-tests
>>>> <https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151> step
>>>> times out, and that is currently not handled gracefully? To be more
>>>> specific, it hangs for 120 minutes, times out, but the parent script for
>>>> some reason is also terminated. Does that sound right?
>>>>
>>>> Nick
>>>>
>>>>
>>>> On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman <
>>>> shivaram@eecs.berkeley.edu> wrote:
>>>>
>>>>> Jenkins runs for this PR https://github.com/apache/spark/pull/1960
>>>>> timed out without notification. The relevant Jenkins logs are at
>>>>>
>>>>>
>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull
>>>>>
>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull
>>>>>
>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull
>>>>>
>>>>>
>>>>> On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> Shivaram,
>>>>>>
>>>>>> Can you point us to an example of that happening? The Jenkins console
>>>>>> output, that is.
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman <
>>>>>> shivaram@eecs.berkeley.edu> wrote:
>>>>>>
>>>>>>> Also I think Jenkins doesn't post build timeouts to github. Is there
>>>>>>> anyway
>>>>>>> we can fix that ?
>>>>>>> On Aug 15, 2014 9:04 AM, "Patrick Wendell" <pw...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Hi All,
>>>>>>> >
>>>>>>> > I noticed that all PR tests run overnight had failed due to
>>>>>>> timeouts. The
>>>>>>> > patch that updates the netty shuffle I believe somehow inflated to
>>>>>>> the
>>>>>>> > build time significantly. That patch had been tested, but one
>>>>>>> change was
>>>>>>> > made before it was merged that was not tested.
>>>>>>> >
>>>>>>> > I've reverted the patch for now to see if it brings the build
>>>>>>> times back
>>>>>>> > down.
>>>>>>> >
>>>>>>> > - Patrick
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Tests failing

Posted by Patrick Wendell <pw...@gmail.com>.
Yeah I was thinking something like that. Basically we should just have a
variable for the timeout and I can make sure it's under the configured
Jenkins time.


On Fri, Aug 15, 2014 at 1:55 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> So 2 hours is a hard cap on how long a build can run. Okie doke.
>
> Perhaps then I'll wrap the run-tests step as you suggest and limit it to
> 100 minutes or something, and cleanly report if it times out.
>
> Sound good?
>
>
> On Fri, Aug 15, 2014 at 4:43 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
>
>> Hey Nicholas,
>>
>> Yeah so Jenkins has it's own timeout mechanism and it will just kill the
>> entire build after 120 minutes. But since run-tests is sitting in the
>> middle of the tests, it can't actually post a failure message.
>>
>> I think run-tests-jenkins should just wrap the call to run-tests in a
>> call in its own timeout. It might be possible to just use this:
>>
>> http://linux.die.net/man/1/timeout
>>
>> - Patrick
>>
>>
>> On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> OK, I've captured this in SPARK-3076
>>> <https://issues.apache.org/jira/browse/SPARK-3076>.
>>>
>>> Patrick,
>>>
>>> Is the problem that this run-tests
>>> <https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151> step
>>> times out, and that is currently not handled gracefully? To be more
>>> specific, it hangs for 120 minutes, times out, but the parent script for
>>> some reason is also terminated. Does that sound right?
>>>
>>> Nick
>>>
>>>
>>> On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman <
>>> shivaram@eecs.berkeley.edu> wrote:
>>>
>>>> Jenkins runs for this PR https://github.com/apache/spark/pull/1960
>>>> timed out without notification. The relevant Jenkins logs are at
>>>>
>>>>
>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull
>>>>
>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull
>>>>
>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull
>>>>
>>>>
>>>> On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> Shivaram,
>>>>>
>>>>> Can you point us to an example of that happening? The Jenkins console
>>>>> output, that is.
>>>>>
>>>>> Nick
>>>>>
>>>>>
>>>>> On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman <
>>>>> shivaram@eecs.berkeley.edu> wrote:
>>>>>
>>>>>> Also I think Jenkins doesn't post build timeouts to github. Is there
>>>>>> anyway
>>>>>> we can fix that ?
>>>>>> On Aug 15, 2014 9:04 AM, "Patrick Wendell" <pw...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > Hi All,
>>>>>> >
>>>>>> > I noticed that all PR tests run overnight had failed due to
>>>>>> timeouts. The
>>>>>> > patch that updates the netty shuffle I believe somehow inflated to
>>>>>> the
>>>>>> > build time significantly. That patch had been tested, but one
>>>>>> change was
>>>>>> > made before it was merged that was not tested.
>>>>>> >
>>>>>> > I've reverted the patch for now to see if it brings the build times
>>>>>> back
>>>>>> > down.
>>>>>> >
>>>>>> > - Patrick
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Tests failing

Posted by Nicholas Chammas <ni...@gmail.com>.
So 2 hours is a hard cap on how long a build can run. Okie doke.

Perhaps then I'll wrap the run-tests step as you suggest and limit it to
100 minutes or something, and cleanly report if it times out.

Sound good?


On Fri, Aug 15, 2014 at 4:43 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Hey Nicholas,
>
> Yeah so Jenkins has it's own timeout mechanism and it will just kill the
> entire build after 120 minutes. But since run-tests is sitting in the
> middle of the tests, it can't actually post a failure message.
>
> I think run-tests-jenkins should just wrap the call to run-tests in a call
> in its own timeout. It might be possible to just use this:
>
> http://linux.die.net/man/1/timeout
>
> - Patrick
>
>
> On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> OK, I've captured this in SPARK-3076
>> <https://issues.apache.org/jira/browse/SPARK-3076>.
>>
>> Patrick,
>>
>> Is the problem that this run-tests
>> <https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151> step
>> times out, and that is currently not handled gracefully? To be more
>> specific, it hangs for 120 minutes, times out, but the parent script for
>> some reason is also terminated. Does that sound right?
>>
>> Nick
>>
>>
>> On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman <
>> shivaram@eecs.berkeley.edu> wrote:
>>
>>> Jenkins runs for this PR https://github.com/apache/spark/pull/1960
>>> timed out without notification. The relevant Jenkins logs are at
>>>
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull
>>>
>>>
>>> On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> Shivaram,
>>>>
>>>> Can you point us to an example of that happening? The Jenkins console
>>>> output, that is.
>>>>
>>>> Nick
>>>>
>>>>
>>>> On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman <
>>>> shivaram@eecs.berkeley.edu> wrote:
>>>>
>>>>> Also I think Jenkins doesn't post build timeouts to github. Is there
>>>>> anyway
>>>>> we can fix that ?
>>>>> On Aug 15, 2014 9:04 AM, "Patrick Wendell" <pw...@gmail.com> wrote:
>>>>>
>>>>> > Hi All,
>>>>> >
>>>>> > I noticed that all PR tests run overnight had failed due to
>>>>> timeouts. The
>>>>> > patch that updates the netty shuffle I believe somehow inflated to
>>>>> the
>>>>> > build time significantly. That patch had been tested, but one change
>>>>> was
>>>>> > made before it was merged that was not tested.
>>>>> >
>>>>> > I've reverted the patch for now to see if it brings the build times
>>>>> back
>>>>> > down.
>>>>> >
>>>>> > - Patrick
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Tests failing

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Nicholas,

Yeah so Jenkins has it's own timeout mechanism and it will just kill the
entire build after 120 minutes. But since run-tests is sitting in the
middle of the tests, it can't actually post a failure message.

I think run-tests-jenkins should just wrap the call to run-tests in a call
in its own timeout. It might be possible to just use this:

http://linux.die.net/man/1/timeout

- Patrick


On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> OK, I've captured this in SPARK-3076
> <https://issues.apache.org/jira/browse/SPARK-3076>.
>
> Patrick,
>
> Is the problem that this run-tests
> <https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151> step
> times out, and that is currently not handled gracefully? To be more
> specific, it hangs for 120 minutes, times out, but the parent script for
> some reason is also terminated. Does that sound right?
>
> Nick
>
>
> On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
>> Jenkins runs for this PR https://github.com/apache/spark/pull/1960 timed
>> out without notification. The relevant Jenkins logs are at
>>
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull
>>
>>
>> On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> Shivaram,
>>>
>>> Can you point us to an example of that happening? The Jenkins console
>>> output, that is.
>>>
>>> Nick
>>>
>>>
>>> On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman <
>>> shivaram@eecs.berkeley.edu> wrote:
>>>
>>>> Also I think Jenkins doesn't post build timeouts to github. Is there
>>>> anyway
>>>> we can fix that ?
>>>> On Aug 15, 2014 9:04 AM, "Patrick Wendell" <pw...@gmail.com> wrote:
>>>>
>>>> > Hi All,
>>>> >
>>>> > I noticed that all PR tests run overnight had failed due to timeouts.
>>>> The
>>>> > patch that updates the netty shuffle I believe somehow inflated to the
>>>> > build time significantly. That patch had been tested, but one change
>>>> was
>>>> > made before it was merged that was not tested.
>>>> >
>>>> > I've reverted the patch for now to see if it brings the build times
>>>> back
>>>> > down.
>>>> >
>>>> > - Patrick
>>>> >
>>>>
>>>
>>>
>>
>

Re: Tests failing

Posted by Nicholas Chammas <ni...@gmail.com>.
OK, I've captured this in SPARK-3076
<https://issues.apache.org/jira/browse/SPARK-3076>.

Patrick,

Is the problem that this run-tests
<https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151>
step
times out, and that is currently not handled gracefully? To be more
specific, it hangs for 120 minutes, times out, but the parent script for
some reason is also terminated. Does that sound right?

Nick


On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> Jenkins runs for this PR https://github.com/apache/spark/pull/1960 timed
> out without notification. The relevant Jenkins logs are at
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull
>
>
> On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Shivaram,
>>
>> Can you point us to an example of that happening? The Jenkins console
>> output, that is.
>>
>> Nick
>>
>>
>> On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman <
>> shivaram@eecs.berkeley.edu> wrote:
>>
>>> Also I think Jenkins doesn't post build timeouts to github. Is there
>>> anyway
>>> we can fix that ?
>>> On Aug 15, 2014 9:04 AM, "Patrick Wendell" <pw...@gmail.com> wrote:
>>>
>>> > Hi All,
>>> >
>>> > I noticed that all PR tests run overnight had failed due to timeouts.
>>> The
>>> > patch that updates the netty shuffle I believe somehow inflated to the
>>> > build time significantly. That patch had been tested, but one change
>>> was
>>> > made before it was merged that was not tested.
>>> >
>>> > I've reverted the patch for now to see if it brings the build times
>>> back
>>> > down.
>>> >
>>> > - Patrick
>>> >
>>>
>>
>>
>

Re: Tests failing

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.
Jenkins runs for this PR https://github.com/apache/spark/pull/1960 timed
out without notification. The relevant Jenkins logs are at

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull


On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Shivaram,
>
> Can you point us to an example of that happening? The Jenkins console
> output, that is.
>
> Nick
>
>
> On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
>> Also I think Jenkins doesn't post build timeouts to github. Is there
>> anyway
>> we can fix that ?
>> On Aug 15, 2014 9:04 AM, "Patrick Wendell" <pw...@gmail.com> wrote:
>>
>> > Hi All,
>> >
>> > I noticed that all PR tests run overnight had failed due to timeouts.
>> The
>> > patch that updates the netty shuffle I believe somehow inflated to the
>> > build time significantly. That patch had been tested, but one change was
>> > made before it was merged that was not tested.
>> >
>> > I've reverted the patch for now to see if it brings the build times back
>> > down.
>> >
>> > - Patrick
>> >
>>
>
>

Re: Tests failing

Posted by Patrick Wendell <pw...@gmail.com>.
We'll need to build timeouts into our own reporting infrastructure - it
shouldn't be too bad but we just need to script it. Unfortunately the
Jenkins plug-in is either "all or nothing" in what it reports, so we can't
have it report timeouts unless we want all the other fairly noisy messages
from it.


On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Shivaram,
>
> Can you point us to an example of that happening? The Jenkins console
> output, that is.
>
> Nick
>
>
> On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
>> Also I think Jenkins doesn't post build timeouts to github. Is there
>> anyway
>> we can fix that ?
>> On Aug 15, 2014 9:04 AM, "Patrick Wendell" <pw...@gmail.com> wrote:
>>
>> > Hi All,
>> >
>> > I noticed that all PR tests run overnight had failed due to timeouts.
>> The
>> > patch that updates the netty shuffle I believe somehow inflated to the
>> > build time significantly. That patch had been tested, but one change was
>> > made before it was merged that was not tested.
>> >
>> > I've reverted the patch for now to see if it brings the build times back
>> > down.
>> >
>> > - Patrick
>> >
>>
>
>

Re: Tests failing

Posted by Nicholas Chammas <ni...@gmail.com>.
Shivaram,

Can you point us to an example of that happening? The Jenkins console
output, that is.

Nick


On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> Also I think Jenkins doesn't post build timeouts to github. Is there anyway
> we can fix that ?
> On Aug 15, 2014 9:04 AM, "Patrick Wendell" <pw...@gmail.com> wrote:
>
> > Hi All,
> >
> > I noticed that all PR tests run overnight had failed due to timeouts. The
> > patch that updates the netty shuffle I believe somehow inflated to the
> > build time significantly. That patch had been tested, but one change was
> > made before it was merged that was not tested.
> >
> > I've reverted the patch for now to see if it brings the build times back
> > down.
> >
> > - Patrick
> >
>

Re: Tests failing

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.
Also I think Jenkins doesn't post build timeouts to github. Is there anyway
we can fix that ?
On Aug 15, 2014 9:04 AM, "Patrick Wendell" <pw...@gmail.com> wrote:

> Hi All,
>
> I noticed that all PR tests run overnight had failed due to timeouts. The
> patch that updates the netty shuffle I believe somehow inflated to the
> build time significantly. That patch had been tested, but one change was
> made before it was merged that was not tested.
>
> I've reverted the patch for now to see if it brings the build times back
> down.
>
> - Patrick
>