You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "Felix.徐" <yg...@gmail.com> on 2012/07/16 16:22:19 UTC

CombineFileInputFormat ran out of memory while making splits

HI all,
I have written a MyCombineFileInputFormat extends from
CombineFileInputFormat , it can put multi files together into the same
inputsplit, it works fine for just a small amount of files. But if I try to
process 100,000 small files, the CombineFileInputFormat ran out of memory
while splitting input files:

The stack trace is:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2245)
at java.util.Arrays.copyOf(Arrays.java:2219)
at java.util.ArrayList.grow(ArrayList.java:213)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:187)
at java.util.ArrayList.addAll(ArrayList.java:532)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getHosts(CombineFileInputFormat.java:568)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:410)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at ac.ict.mapreduce.test.MR.main(MR.java:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

It seems that the rack-nodes mapping is too big? .... How to solve this
problem? thanks!

Re: CombineFileInputFormat ran out of memory while making splits

Posted by "Felix.徐" <yg...@gmail.com>.
My hadoop version is 1.0.1 and I didn't specify any parameter.

2012/7/16 Felix.徐 <yg...@gmail.com>

> HI all,
> I have written a MyCombineFileInputFormat extends from
> CombineFileInputFormat , it can put multi files together into the same
> inputsplit, it works fine for just a small amount of files. But if I try to
> process 100,000 small files, the CombineFileInputFormat ran out of memory
> while splitting input files:
>
> The stack trace is:
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2245)
> at java.util.Arrays.copyOf(Arrays.java:2219)
> at java.util.ArrayList.grow(ArrayList.java:213)
> at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:187)
> at java.util.ArrayList.addAll(ArrayList.java:532)
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getHosts(CombineFileInputFormat.java:568)
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:410)
> at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
> at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
> at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
> at ac.ict.mapreduce.test.MR.main(MR.java:114)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> It seems that the rack-nodes mapping is too big? .... How to solve this
> problem? thanks!
>
>
>