You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Steve Rowe <sa...@gmail.com> on 2016/02/02 14:32:45 UTC

Lucene's ASF Jenkins slave was stuck for over 5 days on a single job

I just killed this job: <https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/1085/>

Just before I killed it, the job's status was:

    Started 5 days 10 hr ago
    Build has been executing for 5 days 10 hr on lucene

and the last line of the console was:

[junit4] HEARTBEAT J1 PID(1824@lucene1-us-west): 2016-02-02T13:10:19, stalled for 460516s at: CollectionsAPIDistributedZkTest.test

Even though this is a nightly job, I think allowing it to run for more than a few hours is excessive.

--
Steve
www.lucidworks.com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene's ASF Jenkins slave was stuck for over 5 days on a single job

Posted by Dawid Weiss <da...@gmail.com>.
Like I thought, the forked JVM just terminated in the middle of
writing a log message:

[
 "APPEND_STDERR",
 {
  "chunk": "1230117 INFO  (qtp515704379-9839) [n:127.0.0.1:45276_
c:halfdeletedcollection2 s:shard3 r:core_node9
x:halfdeletedcollection2_shard3_replica1] o.a.s.u.UpdateHandler Usin


Dawid

On Tue, Feb 2, 2016 at 2:53 PM, Dawid Weiss <da...@gmail.com> wrote:
> You'd have to check what actually happened to the forked JVM, Steve.
> The heartbeat is emitted by the controller JVM; a forked JVM should
> have terminated long before -- it probably hit a JVM error or
> something else that prevented normal termination (timeout).
>
> Occasionally we can't even get a stack trace from those zombie JVMs.
>
> Dawid
>
> On Tue, Feb 2, 2016 at 2:32 PM, Steve Rowe <sa...@gmail.com> wrote:
>> I just killed this job: <https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/1085/>
>>
>> Just before I killed it, the job's status was:
>>
>>     Started 5 days 10 hr ago
>>     Build has been executing for 5 days 10 hr on lucene
>>
>> and the last line of the console was:
>>
>> [junit4] HEARTBEAT J1 PID(1824@lucene1-us-west): 2016-02-02T13:10:19, stalled for 460516s at: CollectionsAPIDistributedZkTest.test
>>
>> Even though this is a nightly job, I think allowing it to run for more than a few hours is excessive.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene's ASF Jenkins slave was stuck for over 5 days on a single job

Posted by Dawid Weiss <da...@gmail.com>.
You'd have to check what actually happened to the forked JVM, Steve.
The heartbeat is emitted by the controller JVM; a forked JVM should
have terminated long before -- it probably hit a JVM error or
something else that prevented normal termination (timeout).

Occasionally we can't even get a stack trace from those zombie JVMs.

Dawid

On Tue, Feb 2, 2016 at 2:32 PM, Steve Rowe <sa...@gmail.com> wrote:
> I just killed this job: <https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/1085/>
>
> Just before I killed it, the job's status was:
>
>     Started 5 days 10 hr ago
>     Build has been executing for 5 days 10 hr on lucene
>
> and the last line of the console was:
>
> [junit4] HEARTBEAT J1 PID(1824@lucene1-us-west): 2016-02-02T13:10:19, stalled for 460516s at: CollectionsAPIDistributedZkTest.test
>
> Even though this is a nightly job, I think allowing it to run for more than a few hours is excessive.
>
> --
> Steve
> www.lucidworks.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org