You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Wellington Chevreuil (Jira)" <ji...@apache.org> on 2020/09/01 08:50:00 UTC

[jira] [Comment Edited] (HBASE-24920) A tool to rewrite corrupted HFiles

    [ https://issues.apache.org/jira/browse/HBASE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188273#comment-17188273 ] 

Wellington Chevreuil edited comment on HBASE-24920 at 9/1/20, 8:49 AM:
-----------------------------------------------------------------------

{quote}completely re-implement needed pieces in hbase operator tool so that there are no dependencies on hbase version (making this tool hard to maintain).
{quote}
We've been doing this already in some parts of operator tools code (take a look at HBCKMetaTableAccessor and HBCKFsUtils, which are the operator tools dups of hbase MetaTableAccessor and FsUtils). If the efforts of porting/reimplement these private classes is not too complex, I would suggest to go with this option.


was (Author: wchevreuil):
{quotes}completely re-implement needed pieces in hbase operator tool so that there are no dependencies on hbase version (making this tool hard to maintain).
{quotes}
We've been doing this already in some parts of operator tools code (take a look at HBCKMetaTableAccessor and HBCKFsUtils, which are the operator tools dups of hbase MetaTableAccessor and FsUtils). If the efforts of porting/reimplement these private classes is not too complex, I would suggest to go with this option.

> A tool to rewrite corrupted HFiles
> ----------------------------------
>
>                 Key: HBASE-24920
>                 URL: https://issues.apache.org/jira/browse/HBASE-24920
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: hbase-operator-tools
>            Reporter: Andrey Elenskiy
>            Priority: Major
>
> Typically I have been dealing with corrupted HFiles (due to loss of hdfs blocks) by just removing them. However, It always seemed wasteful to throw away the entire HFile (which can be hundreds of gigabytes), just because one hdfs block is missing (128MB).
> I think there's a possibility for a tool that can rewrite an HFile by skipping corrupted blocks. 
> There can be multiple types of issues with hdfs blocks but any of them can be treated as if the block doesn't exist:
> 1. All the replicas can be lost
> 2. The block can be corrupted due to some bug in hdfs (I've recently run into HDFS-15186 by experimenting with EC).
> At the simplest the tool can be a local mapreduce job (mapper only) with a custom HFile reader input that can seek to next DATABLK to skip corrupted hdfs blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)