You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Ted Yu <yu...@gmail.com> on 2011/08/08 18:18:02 UTC

failed unit test due to locked directory

Hi,
You may have noticed unit test failures with message similar to the
following:
  testInfoServersRedirect(org.apache.hadoop.hbase.TestInfoServers): Cannot
lock storage /home/hadoop/hbase/build/hbase/test/dfs/name1. The directory is
already locked.
  testInfoServersStatusPages(org.apache.hadoop.hbase.TestInfoServers):
Cannot lock storage /home/hadoop/hbase/build/hbase/test/dfs/name1. The
directory is already locked.
This indicated that certain JVMClusterUtil was hanging after the underlying
unit test finished.

I suggest making the following change to JVMClusterUtil:

Index: src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
(revision 1154705)
+++ src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
(working copy)
@@ -44,6 +44,7 @@
     public RegionServerThread(final HRegionServer r, final int index) {
       super(r, "RegionServer:" + index + ";" + r.getServerName());
       this.regionServer = r;
+      this.setDaemon(true);
     }

     /** @return the region server */
@@ -110,6 +111,7 @@
     public MasterThread(final HMaster m, final int index) {
       super(m, "Master:" + index + ";" + m.getServerName());
       this.master = m;
+      this.setDaemon(true);
     }

     /** @return the master */

Please comment.

Re: failed unit test due to locked directory

Posted by Ted Yu <yu...@gmail.com>.
Looks like grep, jps, etc have to carry absolute path.

On Mon, Aug 8, 2011 at 3:55 PM, Stack <st...@duboce.net> wrote:

> I added it. Should trigger on next trunk build.
> St.Ack
>
> On Mon, Aug 8, 2011 at 3:41 PM, Ted Yu <yu...@gmail.com> wrote:
> > How about something like the following:
> >
> > $ ps aux | grep `jps | grep surefirebooter | awk '{print $1}'` | grep -i
> > 'hbase' | grep 'target/surefire'
> > hadoop    9259 34.6  1.8 2087596 594672 pts/0  Sl+  22:38   0:27
> > /usr/java/jdk1.6.0_23/jre/bin/java -enableassertions -Xmx1400m -jar
> > /home/hadoop/hbase/target/surefire/surefirebooter4503367661299017984.jar
> > /home/hadoop/hbase/target/surefire/surefire574928600779662143tmp
> > /home/hadoop/hbase/target/surefire/surefire5612437413429415234tmp
> >
> > We still need to distinguish the tests between 0.90 and TRUNK builds.
> >
> > Getting jstack on top of the above would be useful.
> >
> > On Mon, Aug 8, 2011 at 3:30 PM, Stack <st...@duboce.net> wrote:
> >
> >> On Mon, Aug 8, 2011 at 3:20 PM, Ted Yu <yu...@gmail.com> wrote:
> >> > BTW how can we know whether there were hanging surefire processes on
> >> Jenkins
> >> > ?
> >> >
> >>
> >> We can run a few shell commands before the build starts (I recently
> >> added printing out what the ulimit on the machine is).  Should we add
> >> listing of java processes (There's probably loads running on the box;
> >> would need to figure which were ours)?
> >>
> >> St.Ack
> >>
> >
>

Re: failed unit test due to locked directory

Posted by Stack <st...@duboce.net>.
It didn't work (smile).  See the latest build.  I'll change it some
and retry build.
St.Ack

On Mon, Aug 8, 2011 at 3:55 PM, Stack <st...@duboce.net> wrote:
> I added it. Should trigger on next trunk build.
> St.Ack
>
> On Mon, Aug 8, 2011 at 3:41 PM, Ted Yu <yu...@gmail.com> wrote:
>> How about something like the following:
>>
>> $ ps aux | grep `jps | grep surefirebooter | awk '{print $1}'` | grep -i
>> 'hbase' | grep 'target/surefire'
>> hadoop    9259 34.6  1.8 2087596 594672 pts/0  Sl+  22:38   0:27
>> /usr/java/jdk1.6.0_23/jre/bin/java -enableassertions -Xmx1400m -jar
>> /home/hadoop/hbase/target/surefire/surefirebooter4503367661299017984.jar
>> /home/hadoop/hbase/target/surefire/surefire574928600779662143tmp
>> /home/hadoop/hbase/target/surefire/surefire5612437413429415234tmp
>>
>> We still need to distinguish the tests between 0.90 and TRUNK builds.
>>
>> Getting jstack on top of the above would be useful.
>>
>> On Mon, Aug 8, 2011 at 3:30 PM, Stack <st...@duboce.net> wrote:
>>
>>> On Mon, Aug 8, 2011 at 3:20 PM, Ted Yu <yu...@gmail.com> wrote:
>>> > BTW how can we know whether there were hanging surefire processes on
>>> Jenkins
>>> > ?
>>> >
>>>
>>> We can run a few shell commands before the build starts (I recently
>>> added printing out what the ulimit on the machine is).  Should we add
>>> listing of java processes (There's probably loads running on the box;
>>> would need to figure which were ours)?
>>>
>>> St.Ack
>>>
>>
>

Re: failed unit test due to locked directory

Posted by Stack <st...@duboce.net>.
I added it. Should trigger on next trunk build.
St.Ack

On Mon, Aug 8, 2011 at 3:41 PM, Ted Yu <yu...@gmail.com> wrote:
> How about something like the following:
>
> $ ps aux | grep `jps | grep surefirebooter | awk '{print $1}'` | grep -i
> 'hbase' | grep 'target/surefire'
> hadoop    9259 34.6  1.8 2087596 594672 pts/0  Sl+  22:38   0:27
> /usr/java/jdk1.6.0_23/jre/bin/java -enableassertions -Xmx1400m -jar
> /home/hadoop/hbase/target/surefire/surefirebooter4503367661299017984.jar
> /home/hadoop/hbase/target/surefire/surefire574928600779662143tmp
> /home/hadoop/hbase/target/surefire/surefire5612437413429415234tmp
>
> We still need to distinguish the tests between 0.90 and TRUNK builds.
>
> Getting jstack on top of the above would be useful.
>
> On Mon, Aug 8, 2011 at 3:30 PM, Stack <st...@duboce.net> wrote:
>
>> On Mon, Aug 8, 2011 at 3:20 PM, Ted Yu <yu...@gmail.com> wrote:
>> > BTW how can we know whether there were hanging surefire processes on
>> Jenkins
>> > ?
>> >
>>
>> We can run a few shell commands before the build starts (I recently
>> added printing out what the ulimit on the machine is).  Should we add
>> listing of java processes (There's probably loads running on the box;
>> would need to figure which were ours)?
>>
>> St.Ack
>>
>

Re: failed unit test due to locked directory

Posted by Ted Yu <yu...@gmail.com>.
How about something like the following:

$ ps aux | grep `jps | grep surefirebooter | awk '{print $1}'` | grep -i
'hbase' | grep 'target/surefire'
hadoop    9259 34.6  1.8 2087596 594672 pts/0  Sl+  22:38   0:27
/usr/java/jdk1.6.0_23/jre/bin/java -enableassertions -Xmx1400m -jar
/home/hadoop/hbase/target/surefire/surefirebooter4503367661299017984.jar
/home/hadoop/hbase/target/surefire/surefire574928600779662143tmp
/home/hadoop/hbase/target/surefire/surefire5612437413429415234tmp

We still need to distinguish the tests between 0.90 and TRUNK builds.

Getting jstack on top of the above would be useful.

On Mon, Aug 8, 2011 at 3:30 PM, Stack <st...@duboce.net> wrote:

> On Mon, Aug 8, 2011 at 3:20 PM, Ted Yu <yu...@gmail.com> wrote:
> > BTW how can we know whether there were hanging surefire processes on
> Jenkins
> > ?
> >
>
> We can run a few shell commands before the build starts (I recently
> added printing out what the ulimit on the machine is).  Should we add
> listing of java processes (There's probably loads running on the box;
> would need to figure which were ours)?
>
> St.Ack
>

Re: failed unit test due to locked directory

Posted by Stack <st...@duboce.net>.
On Mon, Aug 8, 2011 at 3:20 PM, Ted Yu <yu...@gmail.com> wrote:
> BTW how can we know whether there were hanging surefire processes on Jenkins
> ?
>

We can run a few shell commands before the build starts (I recently
added printing out what the ulimit on the machine is).  Should we add
listing of java processes (There's probably loads running on the box;
would need to figure which were ours)?

St.Ack

Re: failed unit test due to locked directory

Posted by Ted Yu <yu...@gmail.com>.
I took jstack (which I didn't keep) of a hanging surefire process and saw
RegionServerThread.
I should have examined other threads in the trace.

Will do that next time I see similar test failure.

BTW how can we know whether there were hanging surefire processes on Jenkins
?

On Mon, Aug 8, 2011 at 3:15 PM, Stack <st...@duboce.net> wrote:

> How does your suggested change relate to the lock Ted?  You are
> daemonizing hbase servers but seems like its an outstanding hdfs
> server that is the prob?
> St.Ack
>
> On Mon, Aug 8, 2011 at 9:18 AM, Ted Yu <yu...@gmail.com> wrote:
> > Hi,
> > You may have noticed unit test failures with message similar to the
> > following:
> >  testInfoServersRedirect(org.apache.hadoop.hbase.TestInfoServers): Cannot
> > lock storage /home/hadoop/hbase/build/hbase/test/dfs/name1. The directory
> is
> > already locked.
> >  testInfoServersStatusPages(org.apache.hadoop.hbase.TestInfoServers):
> > Cannot lock storage /home/hadoop/hbase/build/hbase/test/dfs/name1. The
> > directory is already locked.
> > This indicated that certain JVMClusterUtil was hanging after the
> underlying
> > unit test finished.
> >
> > I suggest making the following change to JVMClusterUtil:
> >
> > Index: src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> > ===================================================================
> > --- src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> > (revision 1154705)
> > +++ src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> > (working copy)
> > @@ -44,6 +44,7 @@
> >     public RegionServerThread(final HRegionServer r, final int index) {
> >       super(r, "RegionServer:" + index + ";" + r.getServerName());
> >       this.regionServer = r;
> > +      this.setDaemon(true);
> >     }
> >
> >     /** @return the region server */
> > @@ -110,6 +111,7 @@
> >     public MasterThread(final HMaster m, final int index) {
> >       super(m, "Master:" + index + ";" + m.getServerName());
> >       this.master = m;
> > +      this.setDaemon(true);
> >     }
> >
> >     /** @return the master */
> >
> > Please comment.
> >
>

Re: failed unit test due to locked directory

Posted by Stack <st...@duboce.net>.
How does your suggested change relate to the lock Ted?  You are
daemonizing hbase servers but seems like its an outstanding hdfs
server that is the prob?
St.Ack

On Mon, Aug 8, 2011 at 9:18 AM, Ted Yu <yu...@gmail.com> wrote:
> Hi,
> You may have noticed unit test failures with message similar to the
> following:
>  testInfoServersRedirect(org.apache.hadoop.hbase.TestInfoServers): Cannot
> lock storage /home/hadoop/hbase/build/hbase/test/dfs/name1. The directory is
> already locked.
>  testInfoServersStatusPages(org.apache.hadoop.hbase.TestInfoServers):
> Cannot lock storage /home/hadoop/hbase/build/hbase/test/dfs/name1. The
> directory is already locked.
> This indicated that certain JVMClusterUtil was hanging after the underlying
> unit test finished.
>
> I suggest making the following change to JVMClusterUtil:
>
> Index: src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> ===================================================================
> --- src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> (revision 1154705)
> +++ src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> (working copy)
> @@ -44,6 +44,7 @@
>     public RegionServerThread(final HRegionServer r, final int index) {
>       super(r, "RegionServer:" + index + ";" + r.getServerName());
>       this.regionServer = r;
> +      this.setDaemon(true);
>     }
>
>     /** @return the region server */
> @@ -110,6 +111,7 @@
>     public MasterThread(final HMaster m, final int index) {
>       super(m, "Master:" + index + ";" + m.getServerName());
>       this.master = m;
> +      this.setDaemon(true);
>     }
>
>     /** @return the master */
>
> Please comment.
>

Re: failed unit test due to locked directory

Posted by Ted Yu <yu...@gmail.com>.
We use MiniDFSCluster whose startDataNodes() calls:
      DataNode.runDatanodeDaemon(dn);
where I see this:
      dn.dataNodeThread.setDaemon(true); // needed for JUnit testing

FYI

On Mon, Aug 8, 2011 at 11:08 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Wouldn't it be better if the DNs were daemons?
>
> J-D
>
> On Mon, Aug 8, 2011 at 9:18 AM, Ted Yu <yu...@gmail.com> wrote:
> > Hi,
> > You may have noticed unit test failures with message similar to the
> > following:
> >  testInfoServersRedirect(org.apache.hadoop.hbase.TestInfoServers): Cannot
> > lock storage /home/hadoop/hbase/build/hbase/test/dfs/name1. The directory
> is
> > already locked.
> >  testInfoServersStatusPages(org.apache.hadoop.hbase.TestInfoServers):
> > Cannot lock storage /home/hadoop/hbase/build/hbase/test/dfs/name1. The
> > directory is already locked.
> > This indicated that certain JVMClusterUtil was hanging after the
> underlying
> > unit test finished.
> >
> > I suggest making the following change to JVMClusterUtil:
> >
> > Index: src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> > ===================================================================
> > --- src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> > (revision 1154705)
> > +++ src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> > (working copy)
> > @@ -44,6 +44,7 @@
> >     public RegionServerThread(final HRegionServer r, final int index) {
> >       super(r, "RegionServer:" + index + ";" + r.getServerName());
> >       this.regionServer = r;
> > +      this.setDaemon(true);
> >     }
> >
> >     /** @return the region server */
> > @@ -110,6 +111,7 @@
> >     public MasterThread(final HMaster m, final int index) {
> >       super(m, "Master:" + index + ";" + m.getServerName());
> >       this.master = m;
> > +      this.setDaemon(true);
> >     }
> >
> >     /** @return the master */
> >
> > Please comment.
> >
>

Re: failed unit test due to locked directory

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Wouldn't it be better if the DNs were daemons?

J-D

On Mon, Aug 8, 2011 at 9:18 AM, Ted Yu <yu...@gmail.com> wrote:
> Hi,
> You may have noticed unit test failures with message similar to the
> following:
>  testInfoServersRedirect(org.apache.hadoop.hbase.TestInfoServers): Cannot
> lock storage /home/hadoop/hbase/build/hbase/test/dfs/name1. The directory is
> already locked.
>  testInfoServersStatusPages(org.apache.hadoop.hbase.TestInfoServers):
> Cannot lock storage /home/hadoop/hbase/build/hbase/test/dfs/name1. The
> directory is already locked.
> This indicated that certain JVMClusterUtil was hanging after the underlying
> unit test finished.
>
> I suggest making the following change to JVMClusterUtil:
>
> Index: src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> ===================================================================
> --- src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> (revision 1154705)
> +++ src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
> (working copy)
> @@ -44,6 +44,7 @@
>     public RegionServerThread(final HRegionServer r, final int index) {
>       super(r, "RegionServer:" + index + ";" + r.getServerName());
>       this.regionServer = r;
> +      this.setDaemon(true);
>     }
>
>     /** @return the region server */
> @@ -110,6 +111,7 @@
>     public MasterThread(final HMaster m, final int index) {
>       super(m, "Master:" + index + ";" + m.getServerName());
>       this.master = m;
> +      this.setDaemon(true);
>     }
>
>     /** @return the master */
>
> Please comment.
>