You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Stanley Xu <we...@gmail.com> on 2011/01/12 12:13:17 UTC

How to avoid hole in regions with hbase 0.20.4?

Dear All,

We are using the hbase 0.20.4 on a project, and met the following errors
while insert/update some data to the table today:

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server, retryOnlyOne=true, index=0, islastrow=false,
tries=9, numtries=10, i=0, listsize=2, region=URLTag,
http://msn.ynet.com/view.jsp\x3Foid=76005602\x26pageno=7,1294655021916<http://msn.ynet.com/view.jsp%5Cx3Foid=76005602%5Cx26pageno=7,1294655021916>
for
region URLTag,
http://msn.ent.ynet.com/view.jsp\x3Foid=49939357\x26pageno=10,1294742159472<http://msn.ent.ynet.com/view.jsp%5Cx3Foid=49939357%5Cx26pageno=10,1294742159472>,
row 'http://msn.ent.ynet.com/view.jsp\x3Foid=75954594<http://msn.ent.ynet.com/view.jsp%5Cx3Foid=75954594>',
but failed after 10 attempts.
Exceptions:

at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1167)
~[hbase-0.20.4.jar:na]
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1248)
~[hbase-0.20.4.jar:na]
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
~[hbase-0.20.4.jar:na]
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:510)
~[hbase-0.20.4.jar:na]
at
com.mediav.contextual.targeting.batch.job.HBaseTaggingJob$HBaseTaggingThread.run(HBaseTaggingJob.java:164)
~[crawler-server.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
[na:1.6.0_22]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[na:1.6.0_22]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_22]
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[na:1.6.0_22]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[na:1.6.0_22]
at java.lang.Thread.run(Thread.java:662) [na:1.6.0_22]

I have searched the mail list and find the following link:
http://search-hadoop.com/m/ile3u1vHa0z1/Trying+to+contact+region+server+Some+server+0.20.4+2278&subj=error+adding+row+to+table+in+0+20+4

Which has the same log as what I met, I am wondering to know if this kind of
"hole" between regions is a bug?  Or we just use the hbase in a wrong way?
And is there anyway I could fix it except merging the regions permanently if
it is a bug in hbase? Like upgrade to a newer version of hbase?

We could fix this by merge the regions that has the problem, but I guess we
will then meet it again.

Best wishes,
Stanley Xu

Re: Region Server on Data Node

Posted by Jean-Daniel Cryans <jd...@apache.org>.
If the file is already major compacted (flag in HFile) and there's
only one, then it won't be major compacted. If it always stays like
that and the region never moves (and the balancer never moves any of
those blocks) then it could be that the region never becomes local
yeah. But that's a lot of ifs.

J-D

On Wed, Jan 12, 2011 at 5:04 PM, M. C. Srivas <mc...@gmail.com> wrote:
> Is a region that is never modified still compacted every 24 hrs? If not,
> given that data is typically read more often than written, is there a
> possibility that the region may never become "local"?
>
>
> On Wed, Jan 12, 2011 at 10:22 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> The region server knows nothing about the file locality. The magic
>> happens between the DFSClient and the Namenode; in HDFS, new files
>> will have one block on the local datanode when it's possible, but
>> existing ones won't be moved. One thing though is that the region
>> server compacts files as new ones get flushed, so those rewritten
>> files will be local. Also there's one major compaction per day (if
>> needed), so after roughly 24h the files served by the region server
>> should have one block each on the local datanode.
>>
>> J-D
>>
>> On Wed, Jan 12, 2011 at 10:17 AM, Peter Haidinyak <ph...@local.com>
>> wrote:
>> > Thanks, I had thought I had a data node running on that machine but I
>> didn't. If I setup a data node on the machine will HBase automagically move
>> its files into Hadoop? (hope, hope).
>> >
>> > Thanks again.
>> >
>> > -Pete
>> >
>> > -----Original Message-----
>> > From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of
>> Jean-Daniel Cryans
>> > Sent: Wednesday, January 12, 2011 10:12 AM
>> > To: user@hbase.apache.org
>> > Subject: Re: Region Server on Data Node
>> >
>> > You don't have to, but it's best to do it. This will help you
>> > understanding why:
>> > http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
>> >
>> > J-D
>> >
>> > On Wed, Jan 12, 2011 at 10:04 AM, Peter Haidinyak <ph...@local.com>
>> wrote:
>> >> Hi,
>> >>  This might be a really dumb question but do you need to run a region
>> server on a machine that is being used as a Hadoop data node. If not, what
>> are the performance penalties?
>> >>
>> >> Thanks
>> >>
>> >> -Pete
>> >>
>> >
>>
>

Re: Region Server on Data Node

Posted by "M. C. Srivas" <mc...@gmail.com>.
Is a region that is never modified still compacted every 24 hrs? If not,
given that data is typically read more often than written, is there a
possibility that the region may never become "local"?


On Wed, Jan 12, 2011 at 10:22 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> The region server knows nothing about the file locality. The magic
> happens between the DFSClient and the Namenode; in HDFS, new files
> will have one block on the local datanode when it's possible, but
> existing ones won't be moved. One thing though is that the region
> server compacts files as new ones get flushed, so those rewritten
> files will be local. Also there's one major compaction per day (if
> needed), so after roughly 24h the files served by the region server
> should have one block each on the local datanode.
>
> J-D
>
> On Wed, Jan 12, 2011 at 10:17 AM, Peter Haidinyak <ph...@local.com>
> wrote:
> > Thanks, I had thought I had a data node running on that machine but I
> didn't. If I setup a data node on the machine will HBase automagically move
> its files into Hadoop? (hope, hope).
> >
> > Thanks again.
> >
> > -Pete
> >
> > -----Original Message-----
> > From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of
> Jean-Daniel Cryans
> > Sent: Wednesday, January 12, 2011 10:12 AM
> > To: user@hbase.apache.org
> > Subject: Re: Region Server on Data Node
> >
> > You don't have to, but it's best to do it. This will help you
> > understanding why:
> > http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
> >
> > J-D
> >
> > On Wed, Jan 12, 2011 at 10:04 AM, Peter Haidinyak <ph...@local.com>
> wrote:
> >> Hi,
> >>  This might be a really dumb question but do you need to run a region
> server on a machine that is being used as a Hadoop data node. If not, what
> are the performance penalties?
> >>
> >> Thanks
> >>
> >> -Pete
> >>
> >
>

Re: Region Server on Data Node

Posted by Jean-Daniel Cryans <jd...@apache.org>.
The region server knows nothing about the file locality. The magic
happens between the DFSClient and the Namenode; in HDFS, new files
will have one block on the local datanode when it's possible, but
existing ones won't be moved. One thing though is that the region
server compacts files as new ones get flushed, so those rewritten
files will be local. Also there's one major compaction per day (if
needed), so after roughly 24h the files served by the region server
should have one block each on the local datanode.

J-D

On Wed, Jan 12, 2011 at 10:17 AM, Peter Haidinyak <ph...@local.com> wrote:
> Thanks, I had thought I had a data node running on that machine but I didn't. If I setup a data node on the machine will HBase automagically move its files into Hadoop? (hope, hope).
>
> Thanks again.
>
> -Pete
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
> Sent: Wednesday, January 12, 2011 10:12 AM
> To: user@hbase.apache.org
> Subject: Re: Region Server on Data Node
>
> You don't have to, but it's best to do it. This will help you
> understanding why:
> http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
>
> J-D
>
> On Wed, Jan 12, 2011 at 10:04 AM, Peter Haidinyak <ph...@local.com> wrote:
>> Hi,
>>  This might be a really dumb question but do you need to run a region server on a machine that is being used as a Hadoop data node. If not, what are the performance penalties?
>>
>> Thanks
>>
>> -Pete
>>
>

RE: Region Server on Data Node

Posted by Peter Haidinyak <ph...@local.com>.
Thanks, I had thought I had a data node running on that machine but I didn't. If I setup a data node on the machine will HBase automagically move its files into Hadoop? (hope, hope).

Thanks again.

-Pete

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Wednesday, January 12, 2011 10:12 AM
To: user@hbase.apache.org
Subject: Re: Region Server on Data Node

You don't have to, but it's best to do it. This will help you
understanding why:
http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html

J-D

On Wed, Jan 12, 2011 at 10:04 AM, Peter Haidinyak <ph...@local.com> wrote:
> Hi,
>  This might be a really dumb question but do you need to run a region server on a machine that is being used as a Hadoop data node. If not, what are the performance penalties?
>
> Thanks
>
> -Pete
>

Re: Region Server on Data Node

Posted by Jean-Daniel Cryans <jd...@apache.org>.
You don't have to, but it's best to do it. This will help you
understanding why:
http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html

J-D

On Wed, Jan 12, 2011 at 10:04 AM, Peter Haidinyak <ph...@local.com> wrote:
> Hi,
>  This might be a really dumb question but do you need to run a region server on a machine that is being used as a Hadoop data node. If not, what are the performance penalties?
>
> Thanks
>
> -Pete
>

Region Server on Data Node

Posted by Peter Haidinyak <ph...@local.com>.
Hi,
  This might be a really dumb question but do you need to run a region server on a machine that is being used as a Hadoop data node. If not, what are the performance penalties?

Thanks

-Pete

Re: How to avoid hole in regions with hbase 0.20.4?

Posted by Stanley Xu <we...@gmail.com>.
Dear St.Ack,

We have checked and found the hole through the web interface like

http://192.168.11.81:60010/table.jsp?name=URLTag
<x-msg://14/%E6%A3%80%E6%9F%A5http://192.168.11.81:60010/table.jsp?name=URLTag>

And do a merge to fix that problem. Thanks for your suggestion, we will
check the master log to see if we could find where the hole comes from the
very beginning and get back here from what we got.

Thanks a lot.

Best wishes,
Stanley Xu



On Thu, Jan 13, 2011 at 12:39 AM, Stack <st...@duboce.net> wrote:

> First, please update from 0.20.4.  It has well-known deadlock.  Go to
> 0.20.6.
>
> Second, can you verify you indeed have a hole in your table?   Dump
> the .META. content to a file first.  This might make debugging easier:
>
> $ echo "scan '.META.'" | ./bin/hbase shell > /tmp/meta.txt
>
> Then, taking the row below, try and elicit which region range is
> responsible or is there a hole where this row would land (remember
> regions are demarced by a start row (inclusive) and end row
> (exclusive)).  Otherwise, if the region IS online, try fetching from
> it.  Try fetching its startrow.  Do you get same issue as below?
>
> Next, grep this problematic region in the master logs.  Try to figure
> its provenance.  What happened to it when?  Where was it deployed?  Is
> it doubly-assigned (something that could happen back in 0.20.4 days w/
> a higher frequency).
>
> See how far you get with the above then come back here with what you've
> learned.
>
> Good luck.
> St.Ack
>
> On Wed, Jan 12, 2011 at 3:13 AM, Stanley Xu <we...@gmail.com> wrote:
> > Dear All,
> >
> > We are using the hbase 0.20.4 on a project, and met the following errors
> > while insert/update some data to the table today:
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact
> > region server Some server, retryOnlyOne=true, index=0, islastrow=false,
> > tries=9, numtries=10, i=0, listsize=2, region=URLTag,
> > http://msn.ynet.com/view.jsp\x3Foid=76005602\x26pageno=7,1294655021916<
> http://msn.ynet.com/view.jsp%5Cx3Foid=76005602%5Cx26pageno=7,1294655021916
> >
> > for
> > region URLTag,
> >
> http://msn.ent.ynet.com/view.jsp\x3Foid=49939357\x26pageno=10,1294742159472
> <
> http://msn.ent.ynet.com/view.jsp%5Cx3Foid=49939357%5Cx26pageno=10,1294742159472
> >,
> > row 'http://msn.ent.ynet.com/view.jsp\x3Foid=75954594<
> http://msn.ent.ynet.com/view.jsp%5Cx3Foid=75954594>',
> > but failed after 10 attempts.
> > Exceptions:
> >
> > at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1167)
> > ~[hbase-0.20.4.jar:na]
> > at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1248)
> > ~[hbase-0.20.4.jar:na]
> > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
> > ~[hbase-0.20.4.jar:na]
> > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:510)
> > ~[hbase-0.20.4.jar:na]
> > at
> >
> com.mediav.contextual.targeting.batch.job.HBaseTaggingJob$HBaseTaggingThread.run(HBaseTaggingJob.java:164)
> > ~[crawler-server.jar:na]
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> > [na:1.6.0_22]
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > [na:1.6.0_22]
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_22]
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > [na:1.6.0_22]
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > [na:1.6.0_22]
> > at java.lang.Thread.run(Thread.java:662) [na:1.6.0_22]
> >
> > I have searched the mail list and find the following link:
> >
> http://search-hadoop.com/m/ile3u1vHa0z1/Trying+to+contact+region+server+Some+server+0.20.4+2278&subj=error+adding+row+to+table+in+0+20+4
> >
> > Which has the same log as what I met, I am wondering to know if this kind
> of
> > "hole" between regions is a bug?  Or we just use the hbase in a wrong
> way?
> > And is there anyway I could fix it except merging the regions permanently
> if
> > it is a bug in hbase? Like upgrade to a newer version of hbase?
> >
> > We could fix this by merge the regions that has the problem, but I guess
> we
> > will then meet it again.
> >
> > Best wishes,
> > Stanley Xu
> >
>

Re: How to avoid hole in regions with hbase 0.20.4?

Posted by Stack <st...@duboce.net>.
First, please update from 0.20.4.  It has well-known deadlock.  Go to 0.20.6.

Second, can you verify you indeed have a hole in your table?   Dump
the .META. content to a file first.  This might make debugging easier:

$ echo "scan '.META.'" | ./bin/hbase shell > /tmp/meta.txt

Then, taking the row below, try and elicit which region range is
responsible or is there a hole where this row would land (remember
regions are demarced by a start row (inclusive) and end row
(exclusive)).  Otherwise, if the region IS online, try fetching from
it.  Try fetching its startrow.  Do you get same issue as below?

Next, grep this problematic region in the master logs.  Try to figure
its provenance.  What happened to it when?  Where was it deployed?  Is
it doubly-assigned (something that could happen back in 0.20.4 days w/
a higher frequency).

See how far you get with the above then come back here with what you've learned.

Good luck.
St.Ack

On Wed, Jan 12, 2011 at 3:13 AM, Stanley Xu <we...@gmail.com> wrote:
> Dear All,
>
> We are using the hbase 0.20.4 on a project, and met the following errors
> while insert/update some data to the table today:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server, retryOnlyOne=true, index=0, islastrow=false,
> tries=9, numtries=10, i=0, listsize=2, region=URLTag,
> http://msn.ynet.com/view.jsp\x3Foid=76005602\x26pageno=7,1294655021916<http://msn.ynet.com/view.jsp%5Cx3Foid=76005602%5Cx26pageno=7,1294655021916>
> for
> region URLTag,
> http://msn.ent.ynet.com/view.jsp\x3Foid=49939357\x26pageno=10,1294742159472<http://msn.ent.ynet.com/view.jsp%5Cx3Foid=49939357%5Cx26pageno=10,1294742159472>,
> row 'http://msn.ent.ynet.com/view.jsp\x3Foid=75954594<http://msn.ent.ynet.com/view.jsp%5Cx3Foid=75954594>',
> but failed after 10 attempts.
> Exceptions:
>
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1167)
> ~[hbase-0.20.4.jar:na]
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1248)
> ~[hbase-0.20.4.jar:na]
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
> ~[hbase-0.20.4.jar:na]
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:510)
> ~[hbase-0.20.4.jar:na]
> at
> com.mediav.contextual.targeting.batch.job.HBaseTaggingJob$HBaseTaggingThread.run(HBaseTaggingJob.java:164)
> ~[crawler-server.jar:na]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> [na:1.6.0_22]
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> [na:1.6.0_22]
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_22]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> [na:1.6.0_22]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> [na:1.6.0_22]
> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_22]
>
> I have searched the mail list and find the following link:
> http://search-hadoop.com/m/ile3u1vHa0z1/Trying+to+contact+region+server+Some+server+0.20.4+2278&subj=error+adding+row+to+table+in+0+20+4
>
> Which has the same log as what I met, I am wondering to know if this kind of
> "hole" between regions is a bug?  Or we just use the hbase in a wrong way?
> And is there anyway I could fix it except merging the regions permanently if
> it is a bug in hbase? Like upgrade to a newer version of hbase?
>
> We could fix this by merge the regions that has the problem, but I guess we
> will then meet it again.
>
> Best wishes,
> Stanley Xu
>