You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Michael Di Domenico <md...@gmail.com> on 2008/01/30 22:44:57 UTC

rand-sort example

I'm trying to run the rand-sort benchmark on my cluster, but i see to
be running out of heap space.

I changed the heap parameter in hadoop-env.sh to HADOOP_HEAPSIZE=3000,
did i not change the write parameter?


[hadoop@tsg142 hadoop]$ bin/hadoop jar hadoop-*-examples.jar sort rand rand-sort
Running on 22 nodes to sort from /user/hadoop/rand into
/user/hadoop/rand-sort with 44 reduces.
Job started: Wed Jan 30 16:37:53 EST 2008
08/01/30 16:37:54 INFO mapred.FileInputFormat: Total input paths to
process : 220
08/01/30 16:38:00 INFO mapred.JobClient: Running job: job_200801301636_0001
08/01/30 16:38:01 INFO mapred.JobClient:  map 0% reduce 0%
08/01/30 16:38:11 INFO mapred.JobClient: Task Id :
task_200801301636_0001_m_000017_0, Status : FAILED
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Unknown Source)
        at java.io.ByteArrayOutputStream.write(Unknown Source)
        at java.io.DataOutputStream.write(Unknown Source)
        at org.apache.hadoop.io.BytesWritable.write(BytesWritable.java:137)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:349)
        at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1804)

Re: rand-sort example

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
On Jan 30, 2008, at 3:46 PM, Doug Cutting wrote:

> Arun C Murthy wrote:
>> I guess we need to bump up the default from 200 to 512, what do  
>> others think?
>
> Perhaps instead we should try to figure out why it now takes more  
> than 200MB to run IdentityMapper?
>

Like I mentioned in HADOOP-2751, my best-guess is *io.sort.mb* (set  
to 100M).

Maybe the fix is to lower it, or fix HADOOP-1867... which to my mind  
is the better path.

Arun


Re: rand-sort example

Posted by Doug Cutting <cu...@apache.org>.
Arun C Murthy wrote:
> I guess we need to bump up the default from 200 to 512, what do others 
> think?

Perhaps instead we should try to figure out why it now takes more than 
200MB to run IdentityMapper?

Doug

Re: rand-sort example

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
On Jan 30, 2008, at 1:44 PM, Michael Di Domenico wrote:

> I'm trying to run the rand-sort benchmark on my cluster, but i see to
> be running out of heap space.
>
> I changed the heap parameter in hadoop-env.sh to HADOOP_HEAPSIZE=3000,
> did i not change the write parameter?
>

HADOOP_HEAPSIZE controls the heap size for the Hadoop daemons.  
However, the OOM you see is in the child MapTask.

Use the mapred.child.java.opts to bump it up from 200M to 512M, that  
should work.

I guess we need to bump up the default from 200 to 512, what do  
others think?

Arun

>
> [hadoop@tsg142 hadoop]$ bin/hadoop jar hadoop-*-examples.jar sort  
> rand rand-sort
> Running on 22 nodes to sort from /user/hadoop/rand into
> /user/hadoop/rand-sort with 44 reduces.
> Job started: Wed Jan 30 16:37:53 EST 2008
> 08/01/30 16:37:54 INFO mapred.FileInputFormat: Total input paths to
> process : 220
> 08/01/30 16:38:00 INFO mapred.JobClient: Running job:  
> job_200801301636_0001
> 08/01/30 16:38:01 INFO mapred.JobClient:  map 0% reduce 0%
> 08/01/30 16:38:11 INFO mapred.JobClient: Task Id :
> task_200801301636_0001_m_000017_0, Status : FAILED
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Unknown Source)
>         at java.io.ByteArrayOutputStream.write(Unknown Source)
>         at java.io.DataOutputStream.write(Unknown Source)
>         at org.apache.hadoop.io.BytesWritable.write 
> (BytesWritable.java:137)
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect 
> (MapTask.java:349)
>         at org.apache.hadoop.mapred.lib.IdentityMapper.map 
> (IdentityMapper.java:40)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main 
> (TaskTracker.java:1804)