You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "zhuobin zheng (Jira)" <ji...@apache.org> on 2021/03/08 07:37:00 UTC

[jira] [Commented] (HBASE-21183) loadIncrementalHFiles sometimes throws FileNotFoundException on retry

    [ https://issues.apache.org/jira/browse/HBASE-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297137#comment-17297137 ] 

zhuobin zheng commented on HBASE-21183:
---------------------------------------

May caused by https://issues.apache.org/jira/browse/HBASE-19065. 
 # client request bulkload and server move file to /hbase/data/${namespace}/\{table}/\{region}/\{columnfamily}/
 # concurrent flush cause bulkload failed
 # bulkload client want retry and failed because file is not exists. 

> loadIncrementalHFiles sometimes throws FileNotFoundException on retry
> ---------------------------------------------------------------------
>
>                 Key: HBASE-21183
>                 URL: https://issues.apache.org/jira/browse/HBASE-21183
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Tim Robertson
>            Priority: Major
>
> On a nightly batch job which prepares 100s of well balanced HFiles at around 2GB each, we see sporadic failures in a bulk load. 
> I'm unable to paste the logs here (different network) but they show e.g. the following on a failing day:
> {code:java}
> Trying to load hfile... /my/input/path/...
> Attempt to bulk load region containing ... failed. This is recoverable and will be retried
> Attempt to bulk load region containing ... failed. This is recoverable and will be retried
> Attempt to bulk load region containing ... failed. This is recoverable and will be retried
> Split occurred while grouping HFiles, retry attempt 1 with 3 files remaining to group or split
> Trying to load hfile...
> IOException during splitting
> java.io.FileNotFoundException: File does not exist: /my/input/path/...
> {code}
> The exception get's thrown from [this line|https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java#L685].
>   
>  I should note that this is a secure cluster (CDH 5.12.x).
> I've tried to go through the code, and don't spot an obvious race condition. I don't spot any changes related to this for the later 1.x versions so presume this exists in 1.5.
> I'm yet to get access to the NameNode audit logs when this occurs to trace through the rename() calls around these particular files.
> I don't see timeouts like HBASE-4030



--
This message was sent by Atlassian Jira
(v8.3.4#803005)