You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Varun Sharma <va...@pinterest.com> on 2013/04/06 22:36:42 UTC

Recovery failure during single Get()

Hi,

We are observing this bug for a while when we use HTable.get() operation to
do a single Get call using the "Result get(Get get)" API and I thought its
best to bring it up.

Steps to reproduce this bug:
1) Gracefull restart a region server causing regions to get redistributed.
2) Client call to this region keeps failing since Meta Cache is never
purged on the client for the region that moved.

Reason behind the bug:
1) Client continues to hit the old region server.
2) The old region server throws NotServingRegionException which is not
handled correctly and the META cache entries are never purged for that
server causing the client to keep hitting the old server.

The reason lies in ServerCallable code since we only purge META cache
entries when there is a RetriesExhaustedException, SocketTimeoutException
or ConnectException. However, there is no case check for
NotServingRegionException(s).

Why is this not a problem for Scan(s) and Put(s) ?

a) If a region server is not hosting a region/scanner, then an
UnknownScannerException is thrown which causes a relocateRegion() call
causing a refresh of the META cache for that particular region.
b) For put(s), the processBatchCallback() interface in HConnectionManager
is used which clears out META cache entries for all kinds of exceptions
except DoNotRetryException.

Created HBASE 8285 for this.

Re: Recovery failure during single Get()

Posted by Ted Yu <yu...@gmail.com>.
Thanks for the analysis.

I left some comment on HBASE-8285


On Sat, Apr 6, 2013 at 1:36 PM, Varun Sharma <va...@pinterest.com> wrote:

> Hi,
>
> We are observing this bug for a while when we use HTable.get() operation to
> do a single Get call using the "Result get(Get get)" API and I thought its
> best to bring it up.
>
> Steps to reproduce this bug:
> 1) Gracefull restart a region server causing regions to get redistributed.
> 2) Client call to this region keeps failing since Meta Cache is never
> purged on the client for the region that moved.
>
> Reason behind the bug:
> 1) Client continues to hit the old region server.
> 2) The old region server throws NotServingRegionException which is not
> handled correctly and the META cache entries are never purged for that
> server causing the client to keep hitting the old server.
>
> The reason lies in ServerCallable code since we only purge META cache
> entries when there is a RetriesExhaustedException, SocketTimeoutException
> or ConnectException. However, there is no case check for
> NotServingRegionException(s).
>
> Why is this not a problem for Scan(s) and Put(s) ?
>
> a) If a region server is not hosting a region/scanner, then an
> UnknownScannerException is thrown which causes a relocateRegion() call
> causing a refresh of the META cache for that particular region.
> b) For put(s), the processBatchCallback() interface in HConnectionManager
> is used which clears out META cache entries for all kinds of exceptions
> except DoNotRetryException.
>
> Created HBASE 8285 for this.
>

Re: Recovery failure during single Get()

Posted by Ted Yu <yu...@gmail.com>.
Thanks for the analysis.

I left some comment on HBASE-8285


On Sat, Apr 6, 2013 at 1:36 PM, Varun Sharma <va...@pinterest.com> wrote:

> Hi,
>
> We are observing this bug for a while when we use HTable.get() operation to
> do a single Get call using the "Result get(Get get)" API and I thought its
> best to bring it up.
>
> Steps to reproduce this bug:
> 1) Gracefull restart a region server causing regions to get redistributed.
> 2) Client call to this region keeps failing since Meta Cache is never
> purged on the client for the region that moved.
>
> Reason behind the bug:
> 1) Client continues to hit the old region server.
> 2) The old region server throws NotServingRegionException which is not
> handled correctly and the META cache entries are never purged for that
> server causing the client to keep hitting the old server.
>
> The reason lies in ServerCallable code since we only purge META cache
> entries when there is a RetriesExhaustedException, SocketTimeoutException
> or ConnectException. However, there is no case check for
> NotServingRegionException(s).
>
> Why is this not a problem for Scan(s) and Put(s) ?
>
> a) If a region server is not hosting a region/scanner, then an
> UnknownScannerException is thrown which causes a relocateRegion() call
> causing a refresh of the META cache for that particular region.
> b) For put(s), the processBatchCallback() interface in HConnectionManager
> is used which clears out META cache entries for all kinds of exceptions
> except DoNotRetryException.
>
> Created HBASE 8285 for this.
>