You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by deanforwever2010 <de...@gmail.com> on 2012/08/10 11:26:32 UTC

after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id

on the region server's log :2012-08-10 11:49:50,796 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; Region is not online:
test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b.

after region split, client didnt get result after timeout setting(1.5
second),then the task is canceled by my program, so the HConnectionManager
didnt delete the cachedLocation;
the client  still query the old region id which is no more exists

And more, part of my processes updated the region location info, part
not.I'm sure the network is fine;

how to fix the problem?why does it need so long time to detect the new
regions?

Re: after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id

Posted by deanforwever2010 <de...@gmail.com>.
so it is very wired that in parts of my servers, I didnot get the error and
so the cache was not cleaned.

2012/8/10 N Keywal <nk...@gmail.com>

> If it's a single row, I would expect the server to return the error
> immediately. Then you will have the sleep I was mentioning previously,
> but the cache should be cleaned before the sleep...
>
> On Fri, Aug 10, 2012 at 1:32 PM, deanforwever2010
> <de...@gmail.com> wrote:
> > hi, Keywal
> > my hbase version is 0.94,
> > my query is just to get limited columns of a row,
> > I make a callable task of 1.5 seconds, so  maybe it didnot fail but
> > canceled by my process,so the region cache didnot clear after many
> requests
> > happened.
> > my question is why should it take so long time for failure? and it behave
> > different between my servers, and there is no problem with network.
> >
> > 2012/8/10 N Keywal <nk...@gmail.com>
> >
> >> Hi,
> >>
> >> What are your queries exactly? What's the HBase version?
> >>
> >> The mechanism is:
> >> - There is a location cache, per HConnection, on the client
> >> - The client first tries the region server in its cache
> >> - if it fails, the client removes this entry from the cache and enters
> >> the retry loop
> >> - there is a limited amount of retries and a sleep between the retries
> >> - most of the times, the client will connect to meta to get the new
> >> location
> >>
> >> When there are multiple queries, before HBASE-5924, the errors will be
> >> analyzed after the other regions servers has returned as well. It
> >> could be an explanation. HBASE-5877 exists as well, but only for
> >> moves, not for splits...
> >>
> >> Cheers,
> >>
> >> N.
> >>
> >>
> >> On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010
> >> <de...@gmail.com> wrote:
> >> > on the region server's log :2012-08-10 11:49:50,796 DEBUG
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >> > NotServingRegionException; Region is not online:
> >> > test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b.
> >> >
> >> > after region split, client didnt get result after timeout setting(1.5
> >> > second),then the task is canceled by my program, so the
> >> HConnectionManager
> >> > didnt delete the cachedLocation;
> >> > the client  still query the old region id which is no more exists
> >> >
> >> > And more, part of my processes updated the region location info, part
> >> > not.I'm sure the network is fine;
> >> >
> >> > how to fix the problem?why does it need so long time to detect the new
> >> > regions?
> >>
>

Re: after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id

Posted by N Keywal <nk...@gmail.com>.
If it's a single row, I would expect the server to return the error
immediately. Then you will have the sleep I was mentioning previously,
but the cache should be cleaned before the sleep...

On Fri, Aug 10, 2012 at 1:32 PM, deanforwever2010
<de...@gmail.com> wrote:
> hi, Keywal
> my hbase version is 0.94,
> my query is just to get limited columns of a row,
> I make a callable task of 1.5 seconds, so  maybe it didnot fail but
> canceled by my process,so the region cache didnot clear after many requests
> happened.
> my question is why should it take so long time for failure? and it behave
> different between my servers, and there is no problem with network.
>
> 2012/8/10 N Keywal <nk...@gmail.com>
>
>> Hi,
>>
>> What are your queries exactly? What's the HBase version?
>>
>> The mechanism is:
>> - There is a location cache, per HConnection, on the client
>> - The client first tries the region server in its cache
>> - if it fails, the client removes this entry from the cache and enters
>> the retry loop
>> - there is a limited amount of retries and a sleep between the retries
>> - most of the times, the client will connect to meta to get the new
>> location
>>
>> When there are multiple queries, before HBASE-5924, the errors will be
>> analyzed after the other regions servers has returned as well. It
>> could be an explanation. HBASE-5877 exists as well, but only for
>> moves, not for splits...
>>
>> Cheers,
>>
>> N.
>>
>>
>> On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010
>> <de...@gmail.com> wrote:
>> > on the region server's log :2012-08-10 11:49:50,796 DEBUG
>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > NotServingRegionException; Region is not online:
>> > test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b.
>> >
>> > after region split, client didnt get result after timeout setting(1.5
>> > second),then the task is canceled by my program, so the
>> HConnectionManager
>> > didnt delete the cachedLocation;
>> > the client  still query the old region id which is no more exists
>> >
>> > And more, part of my processes updated the region location info, part
>> > not.I'm sure the network is fine;
>> >
>> > how to fix the problem?why does it need so long time to detect the new
>> > regions?
>>

Re: after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id

Posted by deanforwever2010 <de...@gmail.com>.
hi, Keywal
my hbase version is 0.94,
my query is just to get limited columns of a row,
I make a callable task of 1.5 seconds, so  maybe it didnot fail but
canceled by my process,so the region cache didnot clear after many requests
happened.
my question is why should it take so long time for failure? and it behave
different between my servers, and there is no problem with network.

2012/8/10 N Keywal <nk...@gmail.com>

> Hi,
>
> What are your queries exactly? What's the HBase version?
>
> The mechanism is:
> - There is a location cache, per HConnection, on the client
> - The client first tries the region server in its cache
> - if it fails, the client removes this entry from the cache and enters
> the retry loop
> - there is a limited amount of retries and a sleep between the retries
> - most of the times, the client will connect to meta to get the new
> location
>
> When there are multiple queries, before HBASE-5924, the errors will be
> analyzed after the other regions servers has returned as well. It
> could be an explanation. HBASE-5877 exists as well, but only for
> moves, not for splits...
>
> Cheers,
>
> N.
>
>
> On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010
> <de...@gmail.com> wrote:
> > on the region server's log :2012-08-10 11:49:50,796 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > NotServingRegionException; Region is not online:
> > test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b.
> >
> > after region split, client didnt get result after timeout setting(1.5
> > second),then the task is canceled by my program, so the
> HConnectionManager
> > didnt delete the cachedLocation;
> > the client  still query the old region id which is no more exists
> >
> > And more, part of my processes updated the region location info, part
> > not.I'm sure the network is fine;
> >
> > how to fix the problem?why does it need so long time to detect the new
> > regions?
>

Re: after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id

Posted by N Keywal <nk...@gmail.com>.
Hi,

What are your queries exactly? What's the HBase version?

The mechanism is:
- There is a location cache, per HConnection, on the client
- The client first tries the region server in its cache
- if it fails, the client removes this entry from the cache and enters
the retry loop
- there is a limited amount of retries and a sleep between the retries
- most of the times, the client will connect to meta to get the new location

When there are multiple queries, before HBASE-5924, the errors will be
analyzed after the other regions servers has returned as well. It
could be an explanation. HBASE-5877 exists as well, but only for
moves, not for splits...

Cheers,

N.


On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010
<de...@gmail.com> wrote:
> on the region server's log :2012-08-10 11:49:50,796 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> NotServingRegionException; Region is not online:
> test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b.
>
> after region split, client didnt get result after timeout setting(1.5
> second),then the task is canceled by my program, so the HConnectionManager
> didnt delete the cachedLocation;
> the client  still query the old region id which is no more exists
>
> And more, part of my processes updated the region location info, part
> not.I'm sure the network is fine;
>
> how to fix the problem?why does it need so long time to detect the new
> regions?