You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ryan Rawson <ry...@gmail.com> on 2010/08/07 00:08:49 UTC

Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:

Hi,

When you run into this problem, it's usually a sign of a META problem,
specifically you have a 'hole' in the META table.

The META table contains a series of keys like so:
table,start_row1,<timestamp>    [data]
table,start_row2,<timestamp>    [data]

etc

When we search for a region for a given row, we build a key like so:
'table,my_row,9*19' and so a search called 'closestRowBefore'.  This
finds the region that contains this row.

Now notice that we only put the start row in the key.... each region
has a start_row,end_row, and all the regions are mutually exclusive
and form complete coverage.  Imagine a row for a region was missing,
we'd consistently find the wrong region and the regionserver would
reject the request (correctly so).

That is what is probably happening here.  Check the table dump in the
master web-ui and see if you can find a 'hole'... where the end-key
doesnt match up with the start-key.

If that is the case, there is a script add_table.rb which is used to
fix these things.

-ryan

On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith <st...@yahoo.com> wrote:
> Hello,
>
>  I'm running hbase 0.20.5, and seeing Puts() fail repeatedly when trying to insert a specific item into the database.
>
> Client side I see:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836 for region filestore,
>
> I then looked up which node was hosting the given region (filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b) on the gui, found the following debug message in the regionserver log:
>
> 2010-08-06 14:23:47,414 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts interrupted at index=0 because:Requested row out of range for HRegion filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836, startKey='bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b', getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633', row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d'
>
>
> Which appears to be coming from:
>
> /regionserver/HRegionServer.java:1786:      LOG.debug("Batch puts interrupted at index=" + i + " because:" +
>
> Which is coming from:
>
> ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658:      throw new WrongRegionException("Requested row out of range for " +
>
> This happens repeatedly on a specific item over at least a day or so, even when not much is happening with the cluster.
>
> As far as I can tell, it looks like the logic to select the correct region for a given row is wrong. The row is indeed not in the correct range (at least from what I can tell of the exception thrown), and the check in HRegion.java:1658:
>
>  /** Make sure this is a valid row for the HRegion */
>  private void checkRow(final byte [] row) throws IOException {
>    if(!rowIsInRange(regionInfo, row)) {
>
> Is correctly rejecting the Put().
>
> So it appears the error would be somewhere in:
> HRegion.java:1550:
>  private void put(final Map<byte [],List<KeyValue>> familyMap,
>      boolean writeToWAL) throws IOException {
>
> Which appears to be the actual guts of the insert operation.
> However, I don't know enough about the design of HRegions to really decipher this method. I'll dig into it more, but I thought it might be more efficient just to ask you guys first.
>
> Any ideas?
>
> I can update to 0.20.6, but I don't see any fixed jira's on 0.20.6 that seem related.. I could be wrong. I'm not sure what I should do next. Any more information you guys need?
>
> Note that I am inserting file into the database, and using it's sha256sum as the key. And the file that is failing does indeed have a sha that corresponds to the key in the message above (and is out of range).
>
> Take care,
>  -stu
>
>
>
>
>
>

Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:

Posted by Stuart Smith <st...@yahoo.com>.
Just to follow up - I ran add_table as I had done when I lost a table before - and it fixed the error.

Thanks!

Take care,
  -stu

--- On Fri, 8/6/10, Stuart Smith <st...@yahoo.com> wrote:

> From: Stuart Smith <st...@yahoo.com>
> Subject: Re: Batch puts interrupted ... Requested row out of range for HRegion  filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
> To: user@hbase.apache.org
> Date: Friday, August 6, 2010, 6:50 PM
> Hello Ryan,
> 
>   Yup. There's a hole, exactly where it should be.
> 
> I used add_table.rb once before, and am no expert on it.
> All I have is a note written down:
> 
> To recover lost tables:
> ./hbase org.jruby.Main add_table.rb /hbase/filestore
> 
> Any thing else I need to know? Do I just run the script
> like so?
> Anything need to be shut down before I do?
> 
> Thanks!
> 
> Take care,
>   -stu
> 
> 
> --- On Fri, 8/6/10, Ryan Rawson <ry...@gmail.com>
> wrote:
> 
> > From: Ryan Rawson <ry...@gmail.com>
> > Subject: Re: Batch puts interrupted ... Requested row
> out of range for HRegion  filestore
> ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > To: user@hbase.apache.org
> > Date: Friday, August 6, 2010, 6:08 PM
> > Hi,
> > 
> > When you run into this problem, it's usually a sign of
> a
> > META problem,
> > specifically you have a 'hole' in the META table.
> > 
> > The META table contains a series of keys like so:
> > table,start_row1,<timestamp>    [data]
> > table,start_row2,<timestamp>    [data]
> > 
> > etc
> > 
> > When we search for a region for a given row, we build
> a key
> > like so:
> > 'table,my_row,9*19' and so a search called
> > 'closestRowBefore'.  This
> > finds the region that contains this row.
> > 
> > Now notice that we only put the start row in the
> key....
> > each region
> > has a start_row,end_row, and all the regions are
> mutually
> > exclusive
> > and form complete coverage.  Imagine a row for a
> > region was missing,
> > we'd consistently find the wrong region and the
> > regionserver would
> > reject the request (correctly so).
> > 
> > That is what is probably happening here.  Check the
> > table dump in the
> > master web-ui and see if you can find a 'hole'...
> where the
> > end-key
> > doesnt match up with the start-key.
> > 
> > If that is the case, there is a script add_table.rb
> which
> > is used to
> > fix these things.
> > 
> > -ryan
> > 
> > On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith <st...@yahoo.com>
> > wrote:
> > > Hello,
> > >
> > >  I'm running hbase 0.20.5, and seeing Puts()
> fail
> > repeatedly when trying to insert a specific item into
> the
> > database.
> > >
> > > Client side I see:
> > >
> > >
> >
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > Trying to contact region server Some server,
> > retryOnlyOne=true, index=0, islastrow=true, tries=9,
> > numtries=10, i=0, listsize=1,
> >
> region=filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836
> > for region filestore,
> > >
> > > I then looked up which node was hosting the
> given
> > region
> >
> (filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b)
> > on the gui, found the following debug message in the
> > regionserver log:
> > >
> > > 2010-08-06 14:23:47,414 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> Batch
> > puts interrupted at index=0 because:Requested row out
> of
> > range for HRegion
> >
> filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836,
> >
> startKey='bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b',
> >
> getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633',
> >
> row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d'
> > >
> > >
> > > Which appears to be coming from:
> > >
> > > /regionserver/HRegionServer.java:1786:    
> >  LOG.debug("Batch puts interrupted at index=" + i +
> "
> > because:" +
> > >
> > > Which is coming from:
> > >
> > >
> >
> ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658:
> >      throw new WrongRegionException("Requested row
> out of
> > range for " +
> > >
> > > This happens repeatedly on a specific item over
> at
> > least a day or so, even when not much is happening
> with the
> > cluster.
> > >
> > > As far as I can tell, it looks like the logic to
> > select the correct region for a given row is wrong.
> The row
> > is indeed not in the correct range (at least from what
> I can
> > tell of the exception thrown), and the check in
> > HRegion.java:1658:
> > >
> > >  /** Make sure this is a valid row for the
> HRegion
> > */
> > >  private void checkRow(final byte [] row)
> throws
> > IOException {
> > >    if(!rowIsInRange(regionInfo, row)) {
> > >
> > > Is correctly rejecting the Put().
> > >
> > > So it appears the error would be somewhere in:
> > > HRegion.java:1550:
> > >  private void put(final Map<byte
> > [],List<KeyValue>> familyMap,
> > >      boolean writeToWAL) throws IOException {
> > >
> > > Which appears to be the actual guts of the
> insert
> > operation.
> > > However, I don't know enough about the design of
> > HRegions to really decipher this method. I'll dig into
> it
> > more, but I thought it might be more efficient just to
> ask
> > you guys first.
> > >
> > > Any ideas?
> > >
> > > I can update to 0.20.6, but I don't see any
> fixed
> > jira's on 0.20.6 that seem related.. I could be wrong.
> I'm
> > not sure what I should do next. Any more information
> you
> > guys need?
> > >
> > > Note that I am inserting file into the database,
> and
> > using it's sha256sum as the key. And the file that is
> > failing does indeed have a sha that corresponds to the
> key
> > in the message above (and is out of range).
> > >
> > > Take care,
> > >  -stu
> > >
> > >
> > >
> > >
> > >
> > >
> > 
> 
> 
> 
> 


      

Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:

Posted by Stuart Smith <st...@yahoo.com>.
Hello Ryan,

  Yup. There's a hole, exactly where it should be.

I used add_table.rb once before, and am no expert on it.
All I have is a note written down:

To recover lost tables:
./hbase org.jruby.Main add_table.rb /hbase/filestore

Any thing else I need to know? Do I just run the script like so?
Anything need to be shut down before I do?

Thanks!

Take care,
  -stu


--- On Fri, 8/6/10, Ryan Rawson <ry...@gmail.com> wrote:

> From: Ryan Rawson <ry...@gmail.com>
> Subject: Re: Batch puts interrupted ... Requested row out of range for HRegion  filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
> To: user@hbase.apache.org
> Date: Friday, August 6, 2010, 6:08 PM
> Hi,
> 
> When you run into this problem, it's usually a sign of a
> META problem,
> specifically you have a 'hole' in the META table.
> 
> The META table contains a series of keys like so:
> table,start_row1,<timestamp>    [data]
> table,start_row2,<timestamp>    [data]
> 
> etc
> 
> When we search for a region for a given row, we build a key
> like so:
> 'table,my_row,9*19' and so a search called
> 'closestRowBefore'.  This
> finds the region that contains this row.
> 
> Now notice that we only put the start row in the key....
> each region
> has a start_row,end_row, and all the regions are mutually
> exclusive
> and form complete coverage.  Imagine a row for a
> region was missing,
> we'd consistently find the wrong region and the
> regionserver would
> reject the request (correctly so).
> 
> That is what is probably happening here.  Check the
> table dump in the
> master web-ui and see if you can find a 'hole'... where the
> end-key
> doesnt match up with the start-key.
> 
> If that is the case, there is a script add_table.rb which
> is used to
> fix these things.
> 
> -ryan
> 
> On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith <st...@yahoo.com>
> wrote:
> > Hello,
> >
> >  I'm running hbase 0.20.5, and seeing Puts() fail
> repeatedly when trying to insert a specific item into the
> database.
> >
> > Client side I see:
> >
> >
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server Some server,
> retryOnlyOne=true, index=0, islastrow=true, tries=9,
> numtries=10, i=0, listsize=1,
> region=filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836
> for region filestore,
> >
> > I then looked up which node was hosting the given
> region
> (filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b)
> on the gui, found the following debug message in the
> regionserver log:
> >
> > 2010-08-06 14:23:47,414 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch
> puts interrupted at index=0 because:Requested row out of
> range for HRegion
> filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836,
> startKey='bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b',
> getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633',
> row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d'
> >
> >
> > Which appears to be coming from:
> >
> > /regionserver/HRegionServer.java:1786:    
>  LOG.debug("Batch puts interrupted at index=" + i + "
> because:" +
> >
> > Which is coming from:
> >
> >
> ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658:
>      throw new WrongRegionException("Requested row out of
> range for " +
> >
> > This happens repeatedly on a specific item over at
> least a day or so, even when not much is happening with the
> cluster.
> >
> > As far as I can tell, it looks like the logic to
> select the correct region for a given row is wrong. The row
> is indeed not in the correct range (at least from what I can
> tell of the exception thrown), and the check in
> HRegion.java:1658:
> >
> >  /** Make sure this is a valid row for the HRegion
> */
> >  private void checkRow(final byte [] row) throws
> IOException {
> >    if(!rowIsInRange(regionInfo, row)) {
> >
> > Is correctly rejecting the Put().
> >
> > So it appears the error would be somewhere in:
> > HRegion.java:1550:
> >  private void put(final Map<byte
> [],List<KeyValue>> familyMap,
> >      boolean writeToWAL) throws IOException {
> >
> > Which appears to be the actual guts of the insert
> operation.
> > However, I don't know enough about the design of
> HRegions to really decipher this method. I'll dig into it
> more, but I thought it might be more efficient just to ask
> you guys first.
> >
> > Any ideas?
> >
> > I can update to 0.20.6, but I don't see any fixed
> jira's on 0.20.6 that seem related.. I could be wrong. I'm
> not sure what I should do next. Any more information you
> guys need?
> >
> > Note that I am inserting file into the database, and
> using it's sha256sum as the key. And the file that is
> failing does indeed have a sha that corresponds to the key
> in the message above (and is out of range).
> >
> > Take care,
> >  -stu
> >
> >
> >
> >
> >
> >
>