You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Wellington Chevreuil (Jira)" <ji...@apache.org> on 2021/05/06 10:20:00 UTC
[jira] [Commented] (HBASE-25857) HBase bulk import fails with
exitCode -100
[ https://issues.apache.org/jira/browse/HBASE-25857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340132#comment-17340132 ]
Wellington Chevreuil commented on HBASE-25857:
----------------------------------------------
Please use HBase Dev List <de...@hbase.apache.org> for such discussions. Jira should be used solely for bug reporting or new features/enhancements efforts. This one has not identified a particular hbase bug, it's more of a user issue, apparently with yarn.
> HBase bulk import fails with exitCode -100
> ------------------------------------------
>
> Key: HBASE-25857
> URL: https://issues.apache.org/jira/browse/HBASE-25857
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.2.5
> Reporter: ZFabrik
> Priority: Major
>
> I want to import the data from an HBase 1.0 cluster to our new HBase 2.2.5 cluster.
> Our setup is as follows:
> * 6 data nodes with 250GB disc space each
> * total DFS capacity (as reported by Hadoop): 1.46 TB
> * one additional node running namenode, hmaster, resource manager
> The data I'm trying to import was created by HBase export on the 1.0 HBase cluster and take 14.9 GB in HDFS (due to `hdfs dfs -du -h`)
> The import uses bulk-output with `hasLargeResult=true:`
> {noformat}
> > hdfs dfs -du -h /
> 2.6 G /TMP_IMPORT
> ...
> > yarn jar $HBASE_HOME/lib/hbase-mapreduce-2.2.5.jar import \
> -Dmapreduce.map.speculative=false \
> -Dmapreduce.reduce.speculative=false \
> -Dimport.bulk.output=/HFILES \
> -Dimport.bulk.hasLargeResult=true \
> my_table /TMP_IMPORT
> {noformat}
>
> Approximately 3 hours later the import fails with this error messages:
> {noformat}
> Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for opt/seritrack/tt/nosql/data/yarn/usercache/seritrack/appcache/application_1620201940366_0003/output/attempt_1620201940366_0003_r_000000_1/map_40.out
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:446)
> at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
> at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
> at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$OnDiskMerger.merge(MergeManagerImpl.java:549)
> at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
> {noformat}
>
> The Yarn web UI reports this:
> {noformat}
> AM Container for appattempt_1620201940366_0003_000001 exited with exitCode: -100
> Failing this attempt.Diagnostics: Container released on a *lost* nodeFor more detailed output, check the application tracking page: http://master:8088/cluster/app/application_1620201940366_0003 Then click on links to logs of each attempt.
> {noformat}
> Hadoop name node reports:
> {noformat}
> Configured Capacity: 1.46 TB
> DFS Used: 34.26 GB (2.28%)
> Non DFS Used: 1.33 TB <<<<< !!!
> {noformat}
> We can see that Yarn occupies more than 200GB on each data node (inside .../data/yarn/usercache/xyz/appcache), so that it uses 1.33 TB in total which is almost the whole capacity of 1.5 TB. There are more than 100 files on each data node named like `attempt_1620201940366_0003_m_000035_1_spill_37.out` each of which is 77MB in size.
>
> So my question is, how can I use bulk import if it needs the 100x amount of disk space as cache?
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)