You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Stephen Jiang <sy...@gmail.com> on 2015/04/23 00:52:14 UTC
HBCK question: Exception in checkRegionConsistency() would kill HBCK
I am looking the code at HBaseFsck#checkRegionConsistency(). It checks
region consistency and repair the corruption if requested. However, this
function expects some exceptions. For example, in one aspect of region
repair, it calls HBaseFsckRepair#waitUntilAssigned(), if a region is in
transition for over 120 seconds, the timeout would throw IOException.
The problem I see is that one exception in checkRegionConsistency() would
kill entire hbck operation, because the exception would propagate.
I think the better approach is to skip the troubled region and let hbck
continue to other regions. At the end, users only has a few regions that
needs multiple runs of hbck or manual fix. (Maybe one exception is for
meta table, if a region in meta table is not repaired successful, we should
not continue.)
How do you think?
Thanks
Stephen
Re: HBCK question: Exception in checkRegionConsistency() would kill HBCK
Posted by Ted Yu <yu...@gmail.com>.
How about collecting IOException's in checkRegionConsistencyConcurrently()
and wrap them in MultipleIOException:
/** Encapsulate a list of {@link IOException} into an {@link IOException} */
public class MultipleIOException extends IOException {
Cheers
On Wed, Apr 22, 2015 at 3:52 PM, Stephen Jiang <sy...@gmail.com>
wrote:
> I am looking the code at HBaseFsck#checkRegionConsistency(). It checks
> region consistency and repair the corruption if requested. However, this
> function expects some exceptions. For example, in one aspect of region
> repair, it calls HBaseFsckRepair#waitUntilAssigned(), if a region is in
> transition for over 120 seconds, the timeout would throw IOException.
>
> The problem I see is that one exception in checkRegionConsistency() would
> kill entire hbck operation, because the exception would propagate.
>
> I think the better approach is to skip the troubled region and let hbck
> continue to other regions. At the end, users only has a few regions that
> needs multiple runs of hbck or manual fix. (Maybe one exception is for
> meta table, if a region in meta table is not repaired successful, we should
> not continue.)
>
> How do you think?
>
> Thanks
> Stephen
>
Re: HBCK question: Exception in checkRegionConsistency() would kill
HBCK
Posted by Devaraj Das <dd...@hortonworks.com>.
I think it makes sense to continue to the other regions and eventually list out the regions that couldn't be "fixed".
________________________________________
From: Stephen Jiang <sy...@gmail.com>
Sent: Wednesday, April 22, 2015 3:52 PM
To: dev@hbase.apache.org
Subject: HBCK question: Exception in checkRegionConsistency() would kill HBCK
I am looking the code at HBaseFsck#checkRegionConsistency(). It checks
region consistency and repair the corruption if requested. However, this
function expects some exceptions. For example, in one aspect of region
repair, it calls HBaseFsckRepair#waitUntilAssigned(), if a region is in
transition for over 120 seconds, the timeout would throw IOException.
The problem I see is that one exception in checkRegionConsistency() would
kill entire hbck operation, because the exception would propagate.
I think the better approach is to skip the troubled region and let hbck
continue to other regions. At the end, users only has a few regions that
needs multiple runs of hbck or manual fix. (Maybe one exception is for
meta table, if a region in meta table is not repaired successful, we should
not continue.)
How do you think?
Thanks
Stephen