You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "George P. Stathis" <gs...@traackr.com> on 2011/04/13 00:23:43 UTC

HBase 0.90.2 CDH3B4 - regions infinitely stuck in transition?

In the middle of upgrading our dev environment from 0.89 to 0.90.2CDH3B4.
When we did the upgrade locally (Macs), no issues came up. Different story
on our EC2 dev box it seems.

Background:
- dev is running in pseudo-cluster mode
- we neglected to set replication to 1 from 2 the first time we started it
but we shut it off and fixed that setting

It seems now that some regions are perpetually stuck in transition mode:
https://gist.github.com/916562

Looked at https://issues.apache.org/jira/browse/HBASE-3406 and
https://issues.apache.org/jira/browse/HBASE-3637 trying to find similarities
but I'm not sure it's quite the same issue.

hbase hbck -fix does not seem to rectify the problem. Here is its output:
https://gist.github.com/916567

Any pointers are appreciated. Happy to give more info.

-GS

Re: HBase 0.90.2 CDH3B4 - regions infinitely stuck in transition?

Posted by Vadim Keylis <vk...@gmail.com>.

Good afternoon St.Ack. I've read the article in configured the way it
describes, but I still get the same error.

2011-04-18 15:09:11,635 ERROR
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open
of
region=user,10L\xA2\x13\xDB\xDB\xDB\xDB\xDA\xDA\xB0\x05Z\xCCw!\xCC\x93\xB0!\xCC\x93\x93\xB0!\xCCw!\xCCw!\xCCw
,1303164462772.dbbb665cfc461075670f06f4dfe6b632.
java.io.IOException: Compression algorithm 'lzo' previously failed test.
        at
org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:77)
        at
org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2555)
        at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2544)
        at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532)
        at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262)
        at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94)
        at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)


I check out hadoop-lzo from the link you provided and rebuild the library
based on the instruction in FAQ.

Here my configuration:

hadoop-env.sh:

   export
JAVA_LIBRARY_PATH="/home/hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/"
   # Extra Java CLASSPATH elements.  Optional.
   export
HADOOP_CLASSPATH="/home/hbase/hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar"


core-site.xml

  <property>
    <name>io.compression.codecs</name>

<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
  </property>
  <property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

hbase-env.sh:

export
HBASE_CLASSPATH="/home/hbase/hadoop-lzo-0.4.10/hadoop-lzo-0.4.10.jar:$HBASE_CLASSPATH"


On Thu, Apr 14, 2011 at 9:09 PM, Stack <st...@duboce.net> wrote:

> Vadim:
>
> You've read this https://github.com/toddlipcon/hadoop-lzo?
>
> St.Ack
>
> On Thu, Apr 14, 2011 at 8:39 PM, Vadim Keylis <vk...@gmail.com>
> wrote:
> > Where lzo lib belong because I have similar problem and was not able to
> solve. Help is appreciated
> >
> > Sent from my iPhone
> >
> > Vadim
> >
> > On Apr 12, 2011, at 4:50 PM, "George P. Stathis" <gs...@traackr.com>
> wrote:
> >
> >> Ah!! I always forget to check the region server log:
> >>
> >> java.io.IOException: Compression algorithm 'lzo' previously failed test.
> >> at
> >>
> org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:77)
> >> at
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2555)
> >> at
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2544)
> >> at
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532)
> >> at
> >>
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262)
> >> at
> >>
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94)
> >> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> at java.lang.Thread.run(Thread.java:619)
> >>
> >> Our upgrade script unpacked the LZO libs in the wrong place. I put them
> back
> >> where they should have been and the problem resolved itself. Thanks J-D!
> >>
> >>
> >> On Tue, Apr 12, 2011 at 6:38 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >>
> >>> Could you upgrade to the newly released CDH3 instead? It has a few more
> >>> fixes.
> >>>
> >>> So regarding your issue, I don't see regions stuck. The first one did
> >>> timeout on opening but then it was reassigned (and then I can't see
> >>> anything in the log that says it timed out again).
> >>>
> >>> By the way can you check what the region server was doing instead of
> >>> opening it? Maybe it just has too many to open and it took some time
> >>> to get it opened? I've seen that on our clusters but it eventually
> >>> gets ok.
> >>>
> >>> J-D
> >>>
> >>> On Tue, Apr 12, 2011 at 3:23 PM, George P. Stathis <
> gstathis@traackr.com>
> >>> wrote:
> >>>> In the middle of upgrading our dev environment from 0.89 to
> 0.90.2CDH3B4.
> >>>> When we did the upgrade locally (Macs), no issues came up. Different
> >>> story
> >>>> on our EC2 dev box it seems.
> >>>>
> >>>> Background:
> >>>> - dev is running in pseudo-cluster mode
> >>>> - we neglected to set replication to 1 from 2 the first time we
> started
> >>> it
> >>>> but we shut it off and fixed that setting
> >>>>
> >>>> It seems now that some regions are perpetually stuck in transition
> mode:
> >>>> https://gist.github.com/916562
> >>>>
> >>>> Looked at https://issues.apache.org/jira/browse/HBASE-3406 and
> >>>> https://issues.apache.org/jira/browse/HBASE-3637 trying to find
> >>> similarities
> >>>> but I'm not sure it's quite the same issue.
> >>>>
> >>>> hbase hbck -fix does not seem to rectify the problem. Here is its
> output:
> >>>> https://gist.github.com/916567
> >>>>
> >>>> Any pointers are appreciated. Happy to give more info.
> >>>>
> >>>> -GS
> >>>>
> >>>
> >
>

Re: HBase 0.90.2 CDH3B4 - regions infinitely stuck in transition?

Posted by Stack <st...@duboce.net>.

Vadim:

You've read this https://github.com/toddlipcon/hadoop-lzo?

St.Ack

On Thu, Apr 14, 2011 at 8:39 PM, Vadim Keylis <vk...@gmail.com> wrote:
> Where lzo lib belong because I have similar problem and was not able to solve. Help is appreciated
>
> Sent from my iPhone
>
> Vadim
>
> On Apr 12, 2011, at 4:50 PM, "George P. Stathis" <gs...@traackr.com> wrote:
>
>> Ah!! I always forget to check the region server log:
>>
>> java.io.IOException: Compression algorithm 'lzo' previously failed test.
>> at
>> org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:77)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2555)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2544)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532)
>> at
>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262)
>> at
>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94)
>> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:619)
>>
>> Our upgrade script unpacked the LZO libs in the wrong place. I put them back
>> where they should have been and the problem resolved itself. Thanks J-D!
>>
>>
>> On Tue, Apr 12, 2011 at 6:38 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>>
>>> Could you upgrade to the newly released CDH3 instead? It has a few more
>>> fixes.
>>>
>>> So regarding your issue, I don't see regions stuck. The first one did
>>> timeout on opening but then it was reassigned (and then I can't see
>>> anything in the log that says it timed out again).
>>>
>>> By the way can you check what the region server was doing instead of
>>> opening it? Maybe it just has too many to open and it took some time
>>> to get it opened? I've seen that on our clusters but it eventually
>>> gets ok.
>>>
>>> J-D
>>>
>>> On Tue, Apr 12, 2011 at 3:23 PM, George P. Stathis <gs...@traackr.com>
>>> wrote:
>>>> In the middle of upgrading our dev environment from 0.89 to 0.90.2CDH3B4.
>>>> When we did the upgrade locally (Macs), no issues came up. Different
>>> story
>>>> on our EC2 dev box it seems.
>>>>
>>>> Background:
>>>> - dev is running in pseudo-cluster mode
>>>> - we neglected to set replication to 1 from 2 the first time we started
>>> it
>>>> but we shut it off and fixed that setting
>>>>
>>>> It seems now that some regions are perpetually stuck in transition mode:
>>>> https://gist.github.com/916562
>>>>
>>>> Looked at https://issues.apache.org/jira/browse/HBASE-3406 and
>>>> https://issues.apache.org/jira/browse/HBASE-3637 trying to find
>>> similarities
>>>> but I'm not sure it's quite the same issue.
>>>>
>>>> hbase hbck -fix does not seem to rectify the problem. Here is its output:
>>>> https://gist.github.com/916567
>>>>
>>>> Any pointers are appreciated. Happy to give more info.
>>>>
>>>> -GS
>>>>
>>>
>

Re: HBase 0.90.2 CDH3B4 - regions infinitely stuck in transition?

Posted by Vadim Keylis <vk...@gmail.com>.

Where lzo lib belong because I have similar problem and was not able to solve. Help is appreciated

Sent from my iPhone

Vadim

On Apr 12, 2011, at 4:50 PM, "George P. Stathis" <gs...@traackr.com> wrote:

> Ah!! I always forget to check the region server log:
> 
> java.io.IOException: Compression algorithm 'lzo' previously failed test.
> at
> org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:77)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2555)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2544)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532)
> at
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262)
> at
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> 
> Our upgrade script unpacked the LZO libs in the wrong place. I put them back
> where they should have been and the problem resolved itself. Thanks J-D!
> 
> 
> On Tue, Apr 12, 2011 at 6:38 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
> 
>> Could you upgrade to the newly released CDH3 instead? It has a few more
>> fixes.
>> 
>> So regarding your issue, I don't see regions stuck. The first one did
>> timeout on opening but then it was reassigned (and then I can't see
>> anything in the log that says it timed out again).
>> 
>> By the way can you check what the region server was doing instead of
>> opening it? Maybe it just has too many to open and it took some time
>> to get it opened? I've seen that on our clusters but it eventually
>> gets ok.
>> 
>> J-D
>> 
>> On Tue, Apr 12, 2011 at 3:23 PM, George P. Stathis <gs...@traackr.com>
>> wrote:
>>> In the middle of upgrading our dev environment from 0.89 to 0.90.2CDH3B4.
>>> When we did the upgrade locally (Macs), no issues came up. Different
>> story
>>> on our EC2 dev box it seems.
>>> 
>>> Background:
>>> - dev is running in pseudo-cluster mode
>>> - we neglected to set replication to 1 from 2 the first time we started
>> it
>>> but we shut it off and fixed that setting
>>> 
>>> It seems now that some regions are perpetually stuck in transition mode:
>>> https://gist.github.com/916562
>>> 
>>> Looked at https://issues.apache.org/jira/browse/HBASE-3406 and
>>> https://issues.apache.org/jira/browse/HBASE-3637 trying to find
>> similarities
>>> but I'm not sure it's quite the same issue.
>>> 
>>> hbase hbck -fix does not seem to rectify the problem. Here is its output:
>>> https://gist.github.com/916567
>>> 
>>> Any pointers are appreciated. Happy to give more info.
>>> 
>>> -GS
>>> 
>>

Re: HBase 0.90.2 CDH3B4 - regions infinitely stuck in transition?

Posted by "George P. Stathis" <gs...@traackr.com>.

Ah!! I always forget to check the region server log:

java.io.IOException: Compression algorithm 'lzo' previously failed test.
at
org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:77)
 at
org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2555)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2544)
 at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262)
 at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)

Our upgrade script unpacked the LZO libs in the wrong place. I put them back
where they should have been and the problem resolved itself. Thanks J-D!

On Tue, Apr 12, 2011 at 6:38 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Could you upgrade to the newly released CDH3 instead? It has a few more
> fixes.
>
> So regarding your issue, I don't see regions stuck. The first one did
> timeout on opening but then it was reassigned (and then I can't see
> anything in the log that says it timed out again).
>
> By the way can you check what the region server was doing instead of
> opening it? Maybe it just has too many to open and it took some time
> to get it opened? I've seen that on our clusters but it eventually
> gets ok.
>
> J-D
>
> On Tue, Apr 12, 2011 at 3:23 PM, George P. Stathis <gs...@traackr.com>
> wrote:
> > In the middle of upgrading our dev environment from 0.89 to 0.90.2CDH3B4.
> > When we did the upgrade locally (Macs), no issues came up. Different
> story
> > on our EC2 dev box it seems.
> >
> > Background:
> > - dev is running in pseudo-cluster mode
> > - we neglected to set replication to 1 from 2 the first time we started
> it
> > but we shut it off and fixed that setting
> >
> > It seems now that some regions are perpetually stuck in transition mode:
> > https://gist.github.com/916562
> >
> > Looked at https://issues.apache.org/jira/browse/HBASE-3406 and
> > https://issues.apache.org/jira/browse/HBASE-3637 trying to find
> similarities
> > but I'm not sure it's quite the same issue.
> >
> > hbase hbck -fix does not seem to rectify the problem. Here is its output:
> > https://gist.github.com/916567
> >
> > Any pointers are appreciated. Happy to give more info.
> >
> > -GS
> >
>

Re: HBase 0.90.2 CDH3B4 - regions infinitely stuck in transition?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Could you upgrade to the newly released CDH3 instead? It has a few more fixes.

So regarding your issue, I don't see regions stuck. The first one did
timeout on opening but then it was reassigned (and then I can't see
anything in the log that says it timed out again).

By the way can you check what the region server was doing instead of
opening it? Maybe it just has too many to open and it took some time
to get it opened? I've seen that on our clusters but it eventually
gets ok.

J-D

On Tue, Apr 12, 2011 at 3:23 PM, George P. Stathis <gs...@traackr.com> wrote:
> In the middle of upgrading our dev environment from 0.89 to 0.90.2CDH3B4.
> When we did the upgrade locally (Macs), no issues came up. Different story
> on our EC2 dev box it seems.
>
> Background:
> - dev is running in pseudo-cluster mode
> - we neglected to set replication to 1 from 2 the first time we started it
> but we shut it off and fixed that setting
>
> It seems now that some regions are perpetually stuck in transition mode:
> https://gist.github.com/916562
>
> Looked at https://issues.apache.org/jira/browse/HBASE-3406 and
> https://issues.apache.org/jira/browse/HBASE-3637 trying to find similarities
> but I'm not sure it's quite the same issue.
>
> hbase hbck -fix does not seem to rectify the problem. Here is its output:
> https://gist.github.com/916567
>
> Any pointers are appreciated. Happy to give more info.
>
> -GS
>