You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Ted Yu <yu...@gmail.com> on 2011/10/08 19:04:35 UTC

Analyzing output of Jenkins build Was: Jenkins build is back to normal : HBase-TRUNK #2304

Scott:
Do you have time to write a script for analyzing output of Jenkins and put
it on HBASE-4480 ?
Here is some idea from Ramkrishna:

All statements that has Running in it can be parsed to see if the every next
Running happens after one hop.
Like if the first Running happens to be in 11th line the next Running should
be in 13th.
If this breaks some where then that test is hanging.
This is just one idea. If we can figure out something better we can take it
up.

Cheers

On Sat, Oct 8, 2011 at 9:53 AM, Jesse Yates <je...@gmail.com> wrote:

> The script to do this was written in 4480. Just needs some +1s a
> - It works pretty well.
>
> We might want to also mod it to take in a file that is the output of a run
> and analyze that.
>
> - Jesse Yates
>
> Sent from my iPhone.
>
> On Oct 8, 2011, at 2:51 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Parsing test output will do.
> >
> >
> >
> > On Oct 7, 2011, at 11:44 PM, Akash Ashok <th...@gmail.com> wrote:
> >
> >> Hi Ted & Ram
> >>
> >> Just Figured out the hung test case both in
> >>
> >>
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
> >>
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2304/console
> >>
> >> Running org.apache.hadoop.hbase.io.hfile.slab.TestSlabCache
> >> Running org.apache.hadoop.hbase.io.hfile.TestFixedFileTrailer
> >> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.858
> sec
> >>
> >> TestSlabCache is the culprit
> >>
> >> Just copied into noteped++ and searched for running and it highlighted
> it
> >> and it was easier to find  :)
> >>
> >> And about the script. Is the idea to parse this output and figure out
> the
> >> hung test case or is there a plan to parse the surefire reports xml?
> >>
> >> Cheers,
> >> Akash A
> >>
> >> On Sat, Oct 8, 2011 at 11:13 AM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> Yeah we need such script.
> >>> I went over the tests in
> >>>
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
> >>> and couldn't find out the hanging test.
> >>>
> >>> Cheers
> >>>
> >>> On Fri, Oct 7, 2011 at 10:33 PM, Ramakrishna S Vasudevan 00902313 <
> >>> ramakrishnas@huawei.com> wrote:
> >>>
> >>>> Ted
> >>>>
> >>>> Once we were already discussing regarding some script to find out some
> >>> hung
> >>>> tests?
> >>>>
> >>>> Regards
> >>>> Ram
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>> From: Ted Yu <yu...@gmail.com>
> >>>> Date: Saturday, October 8, 2011 10:58 am
> >>>> Subject: Re: Jenkins build is back to normal : HBase-TRUNK #2304
> >>>> To: dev@hbase.apache.org
> >>>>
> >>>>> From
> >>>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-
> >>>>> TRUNK/2303/console,it wasn't obvious which test(s) hung.
> >>>>> But the following error clearly indicated there was some hanging Java
> >>>>> process:
> >>>>>
> >>>>> [ERROR] Failed to execute goal
> >>>>> org.apache.maven.plugins:maven-surefire-plugin:2.9:test
> (default-test)
> >>>>> on project hbase: Failure or timeout -> [Help
> >>>>> 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
> >>>>> execute goal org.apache.maven.plugins:maven-surefire-plugin:2.9:test
> >>>>> (default-test) on project hbase: Failure or timeout
> >>>>>
> >>>>> Unluckily we don't have access to the build machine.
> >>>>>
> >>>>> On Fri, Oct 7, 2011 at 10:14 PM, Akash Ashok
> >>>>> <th...@gmail.com> wrote:
> >>>>>
> >>>>>> Oh cool. Build is back to normal. Could someone tell me what the
> >>>>> issue was.
> >>>>>> Why was it failing even though there were no failures ?
> >>>>>>
> >>>>>> On Sat, Oct 8, 2011 at 4:45 AM, Apache Jenkins Server <
> >>>>>> jenkins@builds.apache.org> wrote:
> >>>>>>
> >>>>>>> See <https://builds.apache.org/job/HBase-TRUNK/2304/>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
>

Re: Analyzing output of Jenkins build Was: Jenkins build is back to normal : HBase-TRUNK #2304

Posted by Li Pi <lp...@ucsd.edu>.
Top showed only one java process. I'll try this again once I get back.

On Sun, Oct 9, 2011 at 9:24 PM, Todd Lipcon <to...@cloudera.com> wrote:
> That jstack just looks like the trace of the maven process - there
> should be another JVM which is actually running the tests.
>
> -Todd
>
> On Sat, Oct 8, 2011 at 10:14 AM, Li Pi <lp...@ucsd.edu> wrote:
>> I got the thing to fail on my vmware box. Heres the stack trace.
>>
>> Doesn't look like the cache itself is hanging. The 4 runnable threads:
>>
>> "Attach Listener" daemon prio=10 tid=0x0000000001c48000 nid=0x4cac
>> waiting on condition [0x0000000000000000]
>>   java.lang.Thread.State: RUNNABLE
>>
>> "Thread-5" prio=10 tid=0x00007fb714117800 nid=0x4c03 runnable
>> [0x00007fb720a1e000]
>>   java.lang.Thread.State: RUNNABLE
>>        at java.io.FileInputStream.readBytes(Native Method)
>>        at java.io.FileInputStream.read(FileInputStream.java:236)
>>        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
>>        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
>>        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
>>        - locked <0x00000000f20403b0> (a java.io.InputStreamReader)
>>        at java.io.InputStreamReader.read(InputStreamReader.java:184)
>>        at java.io.BufferedReader.fill(BufferedReader.java:153)
>>        at java.io.BufferedReader.readLine(BufferedReader.java:316)
>>        - locked <0x00000000f20403b0> (a java.io.InputStreamReader)
>>        at java.io.BufferedReader.readLine(BufferedReader.java:379)
>>        at org.codehaus.plexus.util.cli.StreamPumper.run(StreamPumper.java:129)
>>
>> "Thread-4" prio=10 tid=0x00007fb714114800 nid=0x4c01 runnable
>> [0x00007fb720e36000]
>>   java.lang.Thread.State: RUNNABLE
>>        at java.io.FileInputStream.readBytes(Native Method)
>>        at java.io.FileInputStream.read(FileInputStream.java:236)
>>        at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
>>        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>        - locked <0x00000000f25c6ce8> (a java.io.BufferedInputStream)
>>        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
>>        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
>>        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
>>        - locked <0x00000000f203d858> (a java.io.InputStreamReader)
>>        at java.io.InputStreamReader.read(InputStreamReader.java:184)
>>        at java.io.BufferedReader.fill(BufferedReader.java:153)
>>        at java.io.BufferedReader.readLine(BufferedReader.java:316)
>>        - locked <0x00000000f203d858> (a java.io.InputStreamReader)
>>        at java.io.BufferedReader.readLine(BufferedReader.java:379)
>>        at org.codehaus.plexus.util.cli.StreamPumper.run(StreamPumper.java:129)
>>
>> "process reaper" daemon prio=10 tid=0x00007fb71401e800 nid=0x4bfe
>> runnable [0x00007fb720c34000]
>>   java.lang.Thread.State: RUNNABLE
>>        at java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>        at java.lang.UNIXProcess.access$900(UNIXProcess.java:36)
>>        at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:148)
>>
>>
>> Looks like fileInputStream.readBytes() is blocking.
>>
>>
>> On Sat, Oct 8, 2011 at 10:04 AM, Ted Yu <yu...@gmail.com> wrote:
>>> Scott:
>>> Do you have time to write a script for analyzing output of Jenkins and put
>>> it on HBASE-4480 ?
>>> Here is some idea from Ramkrishna:
>>>
>>> All statements that has Running in it can be parsed to see if the every next
>>> Running happens after one hop.
>>> Like if the first Running happens to be in 11th line the next Running should
>>> be in 13th.
>>> If this breaks some where then that test is hanging.
>>> This is just one idea. If we can figure out something better we can take it
>>> up.
>>>
>>> Cheers
>>>
>>> On Sat, Oct 8, 2011 at 9:53 AM, Jesse Yates <je...@gmail.com> wrote:
>>>
>>>> The script to do this was written in 4480. Just needs some +1s a
>>>> - It works pretty well.
>>>>
>>>> We might want to also mod it to take in a file that is the output of a run
>>>> and analyze that.
>>>>
>>>> - Jesse Yates
>>>>
>>>> Sent from my iPhone.
>>>>
>>>> On Oct 8, 2011, at 2:51 AM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>> > Parsing test output will do.
>>>> >
>>>> >
>>>> >
>>>> > On Oct 7, 2011, at 11:44 PM, Akash Ashok <th...@gmail.com> wrote:
>>>> >
>>>> >> Hi Ted & Ram
>>>> >>
>>>> >> Just Figured out the hung test case both in
>>>> >>
>>>> >>
>>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
>>>> >>
>>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2304/console
>>>> >>
>>>> >> Running org.apache.hadoop.hbase.io.hfile.slab.TestSlabCache
>>>> >> Running org.apache.hadoop.hbase.io.hfile.TestFixedFileTrailer
>>>> >> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.858
>>>> sec
>>>> >>
>>>> >> TestSlabCache is the culprit
>>>> >>
>>>> >> Just copied into noteped++ and searched for running and it highlighted
>>>> it
>>>> >> and it was easier to find  :)
>>>> >>
>>>> >> And about the script. Is the idea to parse this output and figure out
>>>> the
>>>> >> hung test case or is there a plan to parse the surefire reports xml?
>>>> >>
>>>> >> Cheers,
>>>> >> Akash A
>>>> >>
>>>> >> On Sat, Oct 8, 2011 at 11:13 AM, Ted Yu <yu...@gmail.com> wrote:
>>>> >>
>>>> >>> Yeah we need such script.
>>>> >>> I went over the tests in
>>>> >>>
>>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
>>>> >>> and couldn't find out the hanging test.
>>>> >>>
>>>> >>> Cheers
>>>> >>>
>>>> >>> On Fri, Oct 7, 2011 at 10:33 PM, Ramakrishna S Vasudevan 00902313 <
>>>> >>> ramakrishnas@huawei.com> wrote:
>>>> >>>
>>>> >>>> Ted
>>>> >>>>
>>>> >>>> Once we were already discussing regarding some script to find out some
>>>> >>> hung
>>>> >>>> tests?
>>>> >>>>
>>>> >>>> Regards
>>>> >>>> Ram
>>>> >>>>
>>>> >>>>
>>>> >>>> ----- Original Message -----
>>>> >>>> From: Ted Yu <yu...@gmail.com>
>>>> >>>> Date: Saturday, October 8, 2011 10:58 am
>>>> >>>> Subject: Re: Jenkins build is back to normal : HBase-TRUNK #2304
>>>> >>>> To: dev@hbase.apache.org
>>>> >>>>
>>>> >>>>> From
>>>> >>>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-
>>>> >>>>> TRUNK/2303/console,it wasn't obvious which test(s) hung.
>>>> >>>>> But the following error clearly indicated there was some hanging Java
>>>> >>>>> process:
>>>> >>>>>
>>>> >>>>> [ERROR] Failed to execute goal
>>>> >>>>> org.apache.maven.plugins:maven-surefire-plugin:2.9:test
>>>> (default-test)
>>>> >>>>> on project hbase: Failure or timeout -> [Help
>>>> >>>>> 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
>>>> >>>>> execute goal org.apache.maven.plugins:maven-surefire-plugin:2.9:test
>>>> >>>>> (default-test) on project hbase: Failure or timeout
>>>> >>>>>
>>>> >>>>> Unluckily we don't have access to the build machine.
>>>> >>>>>
>>>> >>>>> On Fri, Oct 7, 2011 at 10:14 PM, Akash Ashok
>>>> >>>>> <th...@gmail.com> wrote:
>>>> >>>>>
>>>> >>>>>> Oh cool. Build is back to normal. Could someone tell me what the
>>>> >>>>> issue was.
>>>> >>>>>> Why was it failing even though there were no failures ?
>>>> >>>>>>
>>>> >>>>>> On Sat, Oct 8, 2011 at 4:45 AM, Apache Jenkins Server <
>>>> >>>>>> jenkins@builds.apache.org> wrote:
>>>> >>>>>>
>>>> >>>>>>> See <https://builds.apache.org/job/HBase-TRUNK/2304/>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>
>>>>
>>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Analyzing output of Jenkins build Was: Jenkins build is back to normal : HBase-TRUNK #2304

Posted by Todd Lipcon <to...@cloudera.com>.
That jstack just looks like the trace of the maven process - there
should be another JVM which is actually running the tests.

-Todd

On Sat, Oct 8, 2011 at 10:14 AM, Li Pi <lp...@ucsd.edu> wrote:
> I got the thing to fail on my vmware box. Heres the stack trace.
>
> Doesn't look like the cache itself is hanging. The 4 runnable threads:
>
> "Attach Listener" daemon prio=10 tid=0x0000000001c48000 nid=0x4cac
> waiting on condition [0x0000000000000000]
>   java.lang.Thread.State: RUNNABLE
>
> "Thread-5" prio=10 tid=0x00007fb714117800 nid=0x4c03 runnable
> [0x00007fb720a1e000]
>   java.lang.Thread.State: RUNNABLE
>        at java.io.FileInputStream.readBytes(Native Method)
>        at java.io.FileInputStream.read(FileInputStream.java:236)
>        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
>        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
>        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
>        - locked <0x00000000f20403b0> (a java.io.InputStreamReader)
>        at java.io.InputStreamReader.read(InputStreamReader.java:184)
>        at java.io.BufferedReader.fill(BufferedReader.java:153)
>        at java.io.BufferedReader.readLine(BufferedReader.java:316)
>        - locked <0x00000000f20403b0> (a java.io.InputStreamReader)
>        at java.io.BufferedReader.readLine(BufferedReader.java:379)
>        at org.codehaus.plexus.util.cli.StreamPumper.run(StreamPumper.java:129)
>
> "Thread-4" prio=10 tid=0x00007fb714114800 nid=0x4c01 runnable
> [0x00007fb720e36000]
>   java.lang.Thread.State: RUNNABLE
>        at java.io.FileInputStream.readBytes(Native Method)
>        at java.io.FileInputStream.read(FileInputStream.java:236)
>        at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>        - locked <0x00000000f25c6ce8> (a java.io.BufferedInputStream)
>        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
>        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
>        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
>        - locked <0x00000000f203d858> (a java.io.InputStreamReader)
>        at java.io.InputStreamReader.read(InputStreamReader.java:184)
>        at java.io.BufferedReader.fill(BufferedReader.java:153)
>        at java.io.BufferedReader.readLine(BufferedReader.java:316)
>        - locked <0x00000000f203d858> (a java.io.InputStreamReader)
>        at java.io.BufferedReader.readLine(BufferedReader.java:379)
>        at org.codehaus.plexus.util.cli.StreamPumper.run(StreamPumper.java:129)
>
> "process reaper" daemon prio=10 tid=0x00007fb71401e800 nid=0x4bfe
> runnable [0x00007fb720c34000]
>   java.lang.Thread.State: RUNNABLE
>        at java.lang.UNIXProcess.waitForProcessExit(Native Method)
>        at java.lang.UNIXProcess.access$900(UNIXProcess.java:36)
>        at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:148)
>
>
> Looks like fileInputStream.readBytes() is blocking.
>
>
> On Sat, Oct 8, 2011 at 10:04 AM, Ted Yu <yu...@gmail.com> wrote:
>> Scott:
>> Do you have time to write a script for analyzing output of Jenkins and put
>> it on HBASE-4480 ?
>> Here is some idea from Ramkrishna:
>>
>> All statements that has Running in it can be parsed to see if the every next
>> Running happens after one hop.
>> Like if the first Running happens to be in 11th line the next Running should
>> be in 13th.
>> If this breaks some where then that test is hanging.
>> This is just one idea. If we can figure out something better we can take it
>> up.
>>
>> Cheers
>>
>> On Sat, Oct 8, 2011 at 9:53 AM, Jesse Yates <je...@gmail.com> wrote:
>>
>>> The script to do this was written in 4480. Just needs some +1s a
>>> - It works pretty well.
>>>
>>> We might want to also mod it to take in a file that is the output of a run
>>> and analyze that.
>>>
>>> - Jesse Yates
>>>
>>> Sent from my iPhone.
>>>
>>> On Oct 8, 2011, at 2:51 AM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>> > Parsing test output will do.
>>> >
>>> >
>>> >
>>> > On Oct 7, 2011, at 11:44 PM, Akash Ashok <th...@gmail.com> wrote:
>>> >
>>> >> Hi Ted & Ram
>>> >>
>>> >> Just Figured out the hung test case both in
>>> >>
>>> >>
>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
>>> >>
>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2304/console
>>> >>
>>> >> Running org.apache.hadoop.hbase.io.hfile.slab.TestSlabCache
>>> >> Running org.apache.hadoop.hbase.io.hfile.TestFixedFileTrailer
>>> >> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.858
>>> sec
>>> >>
>>> >> TestSlabCache is the culprit
>>> >>
>>> >> Just copied into noteped++ and searched for running and it highlighted
>>> it
>>> >> and it was easier to find  :)
>>> >>
>>> >> And about the script. Is the idea to parse this output and figure out
>>> the
>>> >> hung test case or is there a plan to parse the surefire reports xml?
>>> >>
>>> >> Cheers,
>>> >> Akash A
>>> >>
>>> >> On Sat, Oct 8, 2011 at 11:13 AM, Ted Yu <yu...@gmail.com> wrote:
>>> >>
>>> >>> Yeah we need such script.
>>> >>> I went over the tests in
>>> >>>
>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
>>> >>> and couldn't find out the hanging test.
>>> >>>
>>> >>> Cheers
>>> >>>
>>> >>> On Fri, Oct 7, 2011 at 10:33 PM, Ramakrishna S Vasudevan 00902313 <
>>> >>> ramakrishnas@huawei.com> wrote:
>>> >>>
>>> >>>> Ted
>>> >>>>
>>> >>>> Once we were already discussing regarding some script to find out some
>>> >>> hung
>>> >>>> tests?
>>> >>>>
>>> >>>> Regards
>>> >>>> Ram
>>> >>>>
>>> >>>>
>>> >>>> ----- Original Message -----
>>> >>>> From: Ted Yu <yu...@gmail.com>
>>> >>>> Date: Saturday, October 8, 2011 10:58 am
>>> >>>> Subject: Re: Jenkins build is back to normal : HBase-TRUNK #2304
>>> >>>> To: dev@hbase.apache.org
>>> >>>>
>>> >>>>> From
>>> >>>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-
>>> >>>>> TRUNK/2303/console,it wasn't obvious which test(s) hung.
>>> >>>>> But the following error clearly indicated there was some hanging Java
>>> >>>>> process:
>>> >>>>>
>>> >>>>> [ERROR] Failed to execute goal
>>> >>>>> org.apache.maven.plugins:maven-surefire-plugin:2.9:test
>>> (default-test)
>>> >>>>> on project hbase: Failure or timeout -> [Help
>>> >>>>> 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
>>> >>>>> execute goal org.apache.maven.plugins:maven-surefire-plugin:2.9:test
>>> >>>>> (default-test) on project hbase: Failure or timeout
>>> >>>>>
>>> >>>>> Unluckily we don't have access to the build machine.
>>> >>>>>
>>> >>>>> On Fri, Oct 7, 2011 at 10:14 PM, Akash Ashok
>>> >>>>> <th...@gmail.com> wrote:
>>> >>>>>
>>> >>>>>> Oh cool. Build is back to normal. Could someone tell me what the
>>> >>>>> issue was.
>>> >>>>>> Why was it failing even though there were no failures ?
>>> >>>>>>
>>> >>>>>> On Sat, Oct 8, 2011 at 4:45 AM, Apache Jenkins Server <
>>> >>>>>> jenkins@builds.apache.org> wrote:
>>> >>>>>>
>>> >>>>>>> See <https://builds.apache.org/job/HBase-TRUNK/2304/>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>>
>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Analyzing output of Jenkins build Was: Jenkins build is back to normal : HBase-TRUNK #2304

Posted by Li Pi <lp...@ucsd.edu>.
I got the thing to fail on my vmware box. Heres the stack trace.

Doesn't look like the cache itself is hanging. The 4 runnable threads:

"Attach Listener" daemon prio=10 tid=0x0000000001c48000 nid=0x4cac
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Thread-5" prio=10 tid=0x00007fb714117800 nid=0x4c03 runnable
[0x00007fb720a1e000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:236)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
	- locked <0x00000000f20403b0> (a java.io.InputStreamReader)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.BufferedReader.fill(BufferedReader.java:153)
	at java.io.BufferedReader.readLine(BufferedReader.java:316)
	- locked <0x00000000f20403b0> (a java.io.InputStreamReader)
	at java.io.BufferedReader.readLine(BufferedReader.java:379)
	at org.codehaus.plexus.util.cli.StreamPumper.run(StreamPumper.java:129)

"Thread-4" prio=10 tid=0x00007fb714114800 nid=0x4c01 runnable
[0x00007fb720e36000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:236)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	- locked <0x00000000f25c6ce8> (a java.io.BufferedInputStream)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
	- locked <0x00000000f203d858> (a java.io.InputStreamReader)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.BufferedReader.fill(BufferedReader.java:153)
	at java.io.BufferedReader.readLine(BufferedReader.java:316)
	- locked <0x00000000f203d858> (a java.io.InputStreamReader)
	at java.io.BufferedReader.readLine(BufferedReader.java:379)
	at org.codehaus.plexus.util.cli.StreamPumper.run(StreamPumper.java:129)

"process reaper" daemon prio=10 tid=0x00007fb71401e800 nid=0x4bfe
runnable [0x00007fb720c34000]
   java.lang.Thread.State: RUNNABLE
	at java.lang.UNIXProcess.waitForProcessExit(Native Method)
	at java.lang.UNIXProcess.access$900(UNIXProcess.java:36)
	at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:148)


Looks like fileInputStream.readBytes() is blocking.


On Sat, Oct 8, 2011 at 10:04 AM, Ted Yu <yu...@gmail.com> wrote:
> Scott:
> Do you have time to write a script for analyzing output of Jenkins and put
> it on HBASE-4480 ?
> Here is some idea from Ramkrishna:
>
> All statements that has Running in it can be parsed to see if the every next
> Running happens after one hop.
> Like if the first Running happens to be in 11th line the next Running should
> be in 13th.
> If this breaks some where then that test is hanging.
> This is just one idea. If we can figure out something better we can take it
> up.
>
> Cheers
>
> On Sat, Oct 8, 2011 at 9:53 AM, Jesse Yates <je...@gmail.com> wrote:
>
>> The script to do this was written in 4480. Just needs some +1s a
>> - It works pretty well.
>>
>> We might want to also mod it to take in a file that is the output of a run
>> and analyze that.
>>
>> - Jesse Yates
>>
>> Sent from my iPhone.
>>
>> On Oct 8, 2011, at 2:51 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > Parsing test output will do.
>> >
>> >
>> >
>> > On Oct 7, 2011, at 11:44 PM, Akash Ashok <th...@gmail.com> wrote:
>> >
>> >> Hi Ted & Ram
>> >>
>> >> Just Figured out the hung test case both in
>> >>
>> >>
>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
>> >>
>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2304/console
>> >>
>> >> Running org.apache.hadoop.hbase.io.hfile.slab.TestSlabCache
>> >> Running org.apache.hadoop.hbase.io.hfile.TestFixedFileTrailer
>> >> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.858
>> sec
>> >>
>> >> TestSlabCache is the culprit
>> >>
>> >> Just copied into noteped++ and searched for running and it highlighted
>> it
>> >> and it was easier to find  :)
>> >>
>> >> And about the script. Is the idea to parse this output and figure out
>> the
>> >> hung test case or is there a plan to parse the surefire reports xml?
>> >>
>> >> Cheers,
>> >> Akash A
>> >>
>> >> On Sat, Oct 8, 2011 at 11:13 AM, Ted Yu <yu...@gmail.com> wrote:
>> >>
>> >>> Yeah we need such script.
>> >>> I went over the tests in
>> >>>
>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
>> >>> and couldn't find out the hanging test.
>> >>>
>> >>> Cheers
>> >>>
>> >>> On Fri, Oct 7, 2011 at 10:33 PM, Ramakrishna S Vasudevan 00902313 <
>> >>> ramakrishnas@huawei.com> wrote:
>> >>>
>> >>>> Ted
>> >>>>
>> >>>> Once we were already discussing regarding some script to find out some
>> >>> hung
>> >>>> tests?
>> >>>>
>> >>>> Regards
>> >>>> Ram
>> >>>>
>> >>>>
>> >>>> ----- Original Message -----
>> >>>> From: Ted Yu <yu...@gmail.com>
>> >>>> Date: Saturday, October 8, 2011 10:58 am
>> >>>> Subject: Re: Jenkins build is back to normal : HBase-TRUNK #2304
>> >>>> To: dev@hbase.apache.org
>> >>>>
>> >>>>> From
>> >>>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-
>> >>>>> TRUNK/2303/console,it wasn't obvious which test(s) hung.
>> >>>>> But the following error clearly indicated there was some hanging Java
>> >>>>> process:
>> >>>>>
>> >>>>> [ERROR] Failed to execute goal
>> >>>>> org.apache.maven.plugins:maven-surefire-plugin:2.9:test
>> (default-test)
>> >>>>> on project hbase: Failure or timeout -> [Help
>> >>>>> 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
>> >>>>> execute goal org.apache.maven.plugins:maven-surefire-plugin:2.9:test
>> >>>>> (default-test) on project hbase: Failure or timeout
>> >>>>>
>> >>>>> Unluckily we don't have access to the build machine.
>> >>>>>
>> >>>>> On Fri, Oct 7, 2011 at 10:14 PM, Akash Ashok
>> >>>>> <th...@gmail.com> wrote:
>> >>>>>
>> >>>>>> Oh cool. Build is back to normal. Could someone tell me what the
>> >>>>> issue was.
>> >>>>>> Why was it failing even though there were no failures ?
>> >>>>>>
>> >>>>>> On Sat, Oct 8, 2011 at 4:45 AM, Apache Jenkins Server <
>> >>>>>> jenkins@builds.apache.org> wrote:
>> >>>>>>
>> >>>>>>> See <https://builds.apache.org/job/HBase-TRUNK/2304/>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>>
>

Re: Analyzing output of Jenkins build Was: Jenkins build is back to normal : HBase-TRUNK #2304

Posted by Ted Yu <yu...@gmail.com>.
Take a look at HBASE-4560 where Ramkrishna posted a script.
The script found TestShell that hung for builds 2314 and 2316.

Cheers

On Mon, Oct 10, 2011 at 10:30 PM, <sk...@kuehns.com> wrote:

> > Scott:
> > Do you have time to write a script for analyzing output of Jenkins and
> put
> > it on HBASE-4480 ?
>
> I'll take a shot and post something on 4480.
>
> > Here is some idea from Ramkrishna:
> >
> > All statements that has Running in it can be parsed to see if the every
> > next
> > Running happens after one hop.
> > Like if the first Running happens to be in 11th line the next Running
> > should
> > be in 13th.
> > If this breaks some where then that test is hanging.
>
> If you come across one of these examples in the next day or so, please
> send it to me.
>
> > This is just one idea. If we can figure out something better we can take
> > it
> > up.
> >
> > Cheers
> >
> > On Sat, Oct 8, 2011 at 9:53 AM, Jesse Yates <je...@gmail.com>
> > wrote:
> >
> >> The script to do this was written in 4480. Just needs some +1s a
> >> - It works pretty well.
> >>
> >> We might want to also mod it to take in a file that is the output of a
> >> run
> >> and analyze that.
> >>
> >> - Jesse Yates
> >>
> >> Sent from my iPhone.
> >>
> >> On Oct 8, 2011, at 2:51 AM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >> > Parsing test output will do.
> >> >
> >> >
> >> >
> >> > On Oct 7, 2011, at 11:44 PM, Akash Ashok <th...@gmail.com>
> >> wrote:
> >> >
> >> >> Hi Ted & Ram
> >> >>
> >> >> Just Figured out the hung test case both in
> >> >>
> >> >>
> >>
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
> >> >>
> >>
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2304/console
> >> >>
> >> >> Running org.apache.hadoop.hbase.io.hfile.slab.TestSlabCache
> >> >> Running org.apache.hadoop.hbase.io.hfile.TestFixedFileTrailer
> >> >> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.858
> >> sec
> >> >>
> >> >> TestSlabCache is the culprit
> >> >>
> >> >> Just copied into noteped++ and searched for running and it
> >> highlighted
> >> it
> >> >> and it was easier to find  :)
> >> >>
> >> >> And about the script. Is the idea to parse this output and figure out
> >> the
> >> >> hung test case or is there a plan to parse the surefire reports xml?
> >> >>
> >> >> Cheers,
> >> >> Akash A
> >> >>
> >> >> On Sat, Oct 8, 2011 at 11:13 AM, Ted Yu <yu...@gmail.com> wrote:
> >> >>
> >> >>> Yeah we need such script.
> >> >>> I went over the tests in
> >> >>>
> >>
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
> >> >>> and couldn't find out the hanging test.
> >> >>>
> >> >>> Cheers
> >> >>>
> >> >>> On Fri, Oct 7, 2011 at 10:33 PM, Ramakrishna S Vasudevan 00902313 <
> >> >>> ramakrishnas@huawei.com> wrote:
> >> >>>
> >> >>>> Ted
> >> >>>>
> >> >>>> Once we were already discussing regarding some script to find out
> >> some
> >> >>> hung
> >> >>>> tests?
> >> >>>>
> >> >>>> Regards
> >> >>>> Ram
> >> >>>>
> >> >>>>
> >> >>>> ----- Original Message -----
> >> >>>> From: Ted Yu <yu...@gmail.com>
> >> >>>> Date: Saturday, October 8, 2011 10:58 am
> >> >>>> Subject: Re: Jenkins build is back to normal : HBase-TRUNK #2304
> >> >>>> To: dev@hbase.apache.org
> >> >>>>
> >> >>>>> From
> >> >>>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-
> >> >>>>> TRUNK/2303/console,it wasn't obvious which test(s) hung.
> >> >>>>> But the following error clearly indicated there was some hanging
> >> Java
> >> >>>>> process:
> >> >>>>>
> >> >>>>> [ERROR] Failed to execute goal
> >> >>>>> org.apache.maven.plugins:maven-surefire-plugin:2.9:test
> >> (default-test)
> >> >>>>> on project hbase: Failure or timeout -> [Help
> >> >>>>> 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed
> >> to
> >> >>>>> execute goal
> >> org.apache.maven.plugins:maven-surefire-plugin:2.9:test
> >> >>>>> (default-test) on project hbase: Failure or timeout
> >> >>>>>
> >> >>>>> Unluckily we don't have access to the build machine.
> >> >>>>>
> >> >>>>> On Fri, Oct 7, 2011 at 10:14 PM, Akash Ashok
> >> >>>>> <th...@gmail.com> wrote:
> >> >>>>>
> >> >>>>>> Oh cool. Build is back to normal. Could someone tell me what the
> >> >>>>> issue was.
> >> >>>>>> Why was it failing even though there were no failures ?
> >> >>>>>>
> >> >>>>>> On Sat, Oct 8, 2011 at 4:45 AM, Apache Jenkins Server <
> >> >>>>>> jenkins@builds.apache.org> wrote:
> >> >>>>>>
> >> >>>>>>> See <https://builds.apache.org/job/HBase-TRUNK/2304/>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >>
> >
>
>
>

Re: Analyzing output of Jenkins build Was: Jenkins build is back to normal : HBase-TRUNK #2304

Posted by sk...@kuehns.com.
> Scott:
> Do you have time to write a script for analyzing output of Jenkins and put
> it on HBASE-4480 ?

I'll take a shot and post something on 4480.

> Here is some idea from Ramkrishna:
>
> All statements that has Running in it can be parsed to see if the every
> next
> Running happens after one hop.
> Like if the first Running happens to be in 11th line the next Running
> should
> be in 13th.
> If this breaks some where then that test is hanging.

If you come across one of these examples in the next day or so, please
send it to me.

> This is just one idea. If we can figure out something better we can take
> it
> up.
>
> Cheers
>
> On Sat, Oct 8, 2011 at 9:53 AM, Jesse Yates <je...@gmail.com>
> wrote:
>
>> The script to do this was written in 4480. Just needs some +1s a
>> - It works pretty well.
>>
>> We might want to also mod it to take in a file that is the output of a
>> run
>> and analyze that.
>>
>> - Jesse Yates
>>
>> Sent from my iPhone.
>>
>> On Oct 8, 2011, at 2:51 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > Parsing test output will do.
>> >
>> >
>> >
>> > On Oct 7, 2011, at 11:44 PM, Akash Ashok <th...@gmail.com>
>> wrote:
>> >
>> >> Hi Ted & Ram
>> >>
>> >> Just Figured out the hung test case both in
>> >>
>> >>
>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
>> >>
>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2304/console
>> >>
>> >> Running org.apache.hadoop.hbase.io.hfile.slab.TestSlabCache
>> >> Running org.apache.hadoop.hbase.io.hfile.TestFixedFileTrailer
>> >> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.858
>> sec
>> >>
>> >> TestSlabCache is the culprit
>> >>
>> >> Just copied into noteped++ and searched for running and it
>> highlighted
>> it
>> >> and it was easier to find  :)
>> >>
>> >> And about the script. Is the idea to parse this output and figure out
>> the
>> >> hung test case or is there a plan to parse the surefire reports xml?
>> >>
>> >> Cheers,
>> >> Akash A
>> >>
>> >> On Sat, Oct 8, 2011 at 11:13 AM, Ted Yu <yu...@gmail.com> wrote:
>> >>
>> >>> Yeah we need such script.
>> >>> I went over the tests in
>> >>>
>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2303/console
>> >>> and couldn't find out the hanging test.
>> >>>
>> >>> Cheers
>> >>>
>> >>> On Fri, Oct 7, 2011 at 10:33 PM, Ramakrishna S Vasudevan 00902313 <
>> >>> ramakrishnas@huawei.com> wrote:
>> >>>
>> >>>> Ted
>> >>>>
>> >>>> Once we were already discussing regarding some script to find out
>> some
>> >>> hung
>> >>>> tests?
>> >>>>
>> >>>> Regards
>> >>>> Ram
>> >>>>
>> >>>>
>> >>>> ----- Original Message -----
>> >>>> From: Ted Yu <yu...@gmail.com>
>> >>>> Date: Saturday, October 8, 2011 10:58 am
>> >>>> Subject: Re: Jenkins build is back to normal : HBase-TRUNK #2304
>> >>>> To: dev@hbase.apache.org
>> >>>>
>> >>>>> From
>> >>>>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-
>> >>>>> TRUNK/2303/console,it wasn't obvious which test(s) hung.
>> >>>>> But the following error clearly indicated there was some hanging
>> Java
>> >>>>> process:
>> >>>>>
>> >>>>> [ERROR] Failed to execute goal
>> >>>>> org.apache.maven.plugins:maven-surefire-plugin:2.9:test
>> (default-test)
>> >>>>> on project hbase: Failure or timeout -> [Help
>> >>>>> 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed
>> to
>> >>>>> execute goal
>> org.apache.maven.plugins:maven-surefire-plugin:2.9:test
>> >>>>> (default-test) on project hbase: Failure or timeout
>> >>>>>
>> >>>>> Unluckily we don't have access to the build machine.
>> >>>>>
>> >>>>> On Fri, Oct 7, 2011 at 10:14 PM, Akash Ashok
>> >>>>> <th...@gmail.com> wrote:
>> >>>>>
>> >>>>>> Oh cool. Build is back to normal. Could someone tell me what the
>> >>>>> issue was.
>> >>>>>> Why was it failing even though there were no failures ?
>> >>>>>>
>> >>>>>> On Sat, Oct 8, 2011 at 4:45 AM, Apache Jenkins Server <
>> >>>>>> jenkins@builds.apache.org> wrote:
>> >>>>>>
>> >>>>>>> See <https://builds.apache.org/job/HBase-TRUNK/2304/>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>>
>