You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Justin Hancock (JIRA)" <ji...@apache.org> on 2013/05/10 13:19:16 UTC

[jira] [Created] (HADOOP-9558) Opening many small files with Zlib compression results in Out of Memory Exception when using Combined Input File Format for many small files

Justin Hancock created HADOOP-9558:
--------------------------------------

             Summary: Opening many small files with Zlib compression results in Out of Memory Exception when using Combined Input File Format for many small files
                 Key: HADOOP-9558
                 URL: https://issues.apache.org/jira/browse/HADOOP-9558
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 0.21.0
         Environment: Red Hat 5.X JDK 1.6.31
            Reporter: Justin Hancock


When running a hive job that tries to combine many small files, ~20,000; tasks fail with an out of Memory Exception detailed below.   The job is trying to combine many small files of approximately 1KB each, the underlying Zlib native implementation allocates a byte buffer of 64Kb.  It is suspected that the problem resides in the ZlibCompressor and its defaults for the ByteBuffer that is created at construction time.  Under less extreme conditions the issue does not manifest itself, tests have been tried with 5000 files and it worked okay.


FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: unable to create new native thread 
at java.lang.Thread.start0(Native Method) 
at java.lang.Thread.start(Thread.java:640) 
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3372) 
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:705) 
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:219) 
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:584) 
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:565) 
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:472) 
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:464) 
at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:80) 
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:247) 
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:235) 
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:458) 
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutWriters(FileSinkOperator.java:599) 
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:539) 
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) 
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) 
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) 
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) 
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) 
at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:87) 
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) 
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:730) 
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:733) 
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:841) 
at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:263) 
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:198) 
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:469) 
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) 
at org.apache.hadoop.mapred.Child$4.run(Child.java:270) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:396) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) 
at org.apache.hadoop.mapred.Child.main(Child.java:264)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira