You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "wuchang (Jira)" <ji...@apache.org> on 2020/06/03 01:29:00 UTC

[jira] [Updated] (HBASE-24420) BulkLoad Will Fall Into Unbelievable Retry Attempt in Some case

     [ https://issues.apache.org/jira/browse/HBASE-24420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

wuchang updated HBASE-24420:
----------------------------
    Attachment: 24420.patch
        Status: Patch Available  (was: Open)

> BulkLoad Will Fall Into Unbelievable Retry Attempt in Some case
> ---------------------------------------------------------------
>
>                 Key: HBASE-24420
>                 URL: https://issues.apache.org/jira/browse/HBASE-24420
>             Project: HBase
>          Issue Type: Bug
>            Reporter: wuchang
>            Priority: Major
>         Attachments: 24420.patch
>
>
> In https://issues.apache.org/jira/browse/HBASE-14541, the retry logic changed from a configurable retry times(by configuration item hbase.bulkload.retries.number) to below retry logic to process the issue that the RegionSplit happened during bulk load:
>  
> {code:java}
> int maxRetries = getConf().getInt("hbase.bulkload.retries.number", 10);
> maxRetries = Math.max(maxRetries, startEndKeys.getFirst().length + 1);
> if (maxRetries != 0 && count >= maxRetries) {
>  throw new IOException("Retry attempted " + count +
>  " times without completing, bailing out");
> }{code}
> This issue caused another issue in our cluster, that is:
>  Our table has *2000* regions and our bulk load failed for an configuration issue:
>  This failure is caused by a client mis-configuration and cause the RegionServer fail to load the HFile, but the response is not thrown as exception to client, but marked as a variable `*loaded*` in the `*BulkLoadHFileResponse*`
> {code:java}
> message BulkLoadHFileResponse {
>  required bool loaded = 1;
> }{code}
>  After this failure, the bulk load fall into a retry disaster and after the retry reached about 200, our HDFS crashed for OOM.
> {code:java}
>  {code}
>  
> But during with, the HBase table splits in fact didn't happen ever.
> I think the patch in HBASE-14541 didn't handle the unrecoverable retry case and in this case(I think many reason may incur unrecoverable retry) the meaningless retry attempts becomes disaster and is un-configurable because we cannot change the Region number of our table;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)