You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Wellington Chevreuil (Jira)" <ji...@apache.org> on 2021/05/06 10:20:00 UTC
[jira] [Commented] (HBASE-25857) HBase bulk import fails with exitCode -100

    [ https://issues.apache.org/jira/browse/HBASE-25857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340132#comment-17340132 ] 

Wellington Chevreuil commented on HBASE-25857:
----------------------------------------------

Please use HBase Dev List <de...@hbase.apache.org> for such discussions. Jira should be used solely for bug reporting or new features/enhancements efforts. This one has not identified a particular hbase bug, it's more of a user issue, apparently with yarn.

> HBase bulk import fails with exitCode -100
> ------------------------------------------
>
>                 Key: HBASE-25857
>                 URL: https://issues.apache.org/jira/browse/HBASE-25857
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.2.5
>            Reporter: ZFabrik
>            Priority: Major
>
> I want to import the data from an HBase 1.0 cluster to our new HBase 2.2.5 cluster.
> Our setup is as follows:
>  * 6 data nodes with 250GB disc space each
>  * total DFS capacity (as reported by Hadoop): 1.46 TB
>  * one additional node running namenode, hmaster, resource manager
> The data I'm trying to import was created by HBase export on the 1.0 HBase cluster and take 14.9 GB in HDFS (due to `hdfs dfs -du -h`)
> The import uses bulk-output with `hasLargeResult=true:`
> {noformat}
> > hdfs dfs -du -h /
> 2.6 G /TMP_IMPORT
> ...
> > yarn jar $HBASE_HOME/lib/hbase-mapreduce-2.2.5.jar import \
>          -Dmapreduce.map.speculative=false \
>          -Dmapreduce.reduce.speculative=false \
>          -Dimport.bulk.output=/HFILES \
>          -Dimport.bulk.hasLargeResult=true \
>          my_table /TMP_IMPORT
> {noformat}
>  
>  Approximately 3 hours later the import fails with this error messages:
> {noformat}
> Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for opt/seritrack/tt/nosql/data/yarn/usercache/seritrack/appcache/application_1620201940366_0003/output/attempt_1620201940366_0003_r_000000_1/map_40.out
>         at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:446)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
>         at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$OnDiskMerger.merge(MergeManagerImpl.java:549)
>         at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
> {noformat}
>  
>  The Yarn web UI reports this:
> {noformat}
> AM Container for appattempt_1620201940366_0003_000001 exited with exitCode: -100
> Failing this attempt.Diagnostics: Container released on a *lost* nodeFor more detailed output, check the application tracking page: http://master:8088/cluster/app/application_1620201940366_0003 Then click on links to logs of each attempt.
> {noformat}
> Hadoop name node reports:
> {noformat}
> Configured Capacity: 1.46 TB
> DFS Used:            34.26 GB (2.28%)
> Non DFS Used:        1.33 TB <<<<< !!!
> {noformat}
> We can see that Yarn occupies more than 200GB on each data node (inside .../data/yarn/usercache/xyz/appcache), so that it uses 1.33 TB in total which is almost the whole capacity of 1.5 TB. There are more than 100 files on each data node named like `attempt_1620201940366_0003_m_000035_1_spill_37.out` each of which is 77MB in size.
>  
> So my question is, how can I use bulk import if it needs the 100x amount of disk space as cache?
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)