You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Albert Strasheim (JIRA)" <ji...@apache.org> on 2007/01/09 20:48:27 UTC

[jira] Commented: (HADOOP-817) Streaming reducers throw OutOfMemory for not so large inputs

    [ https://issues.apache.org/jira/browse/HADOOP-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463379 ] 

Albert Strasheim commented on HADOOP-817:
-----------------------------------------

I'm running 0.10.0 which should include the patch from HADOOP-849 (as far as I can tell), but I'm still running into OutOfMemoryErrors.

I'm following the example from this blog entry:

http://jjinux.blogspot.com/2007/01/clustering-hadoop.html

hadoop-default.xml contains:

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/state/partition1/tmp/hadoop-${user.name}</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>dominatrix.local:54310</value>
</property>
<property>
  <name>mapred.job.tracker</name>
  <value>dominatrix.local:54311</value>
</property>
<property> 
  <name>dfs.replication</name>
  <value>2</value>
</property>
</configuration>

mapred settings are unchanged.

I'm using an input file generated as follows:

perl -e 'for $i(1..99999999) { print "$i\t\n"; }' > input.txt

This generates a 945 MB input file. I then run:

hadoop-0.10.0/bin/hadoop jar hadoop-0.10.0/contrib/hadoop-streaming.jar -mapper mapper.py -reducer reducer.py -input input.txt -output out-dir

I am running the job across 21 nodes.

What next? How do I debug this problem further? I'll try Java 6 in the mean time.

> Streaming reducers throw OutOfMemory for not so large inputs
> ------------------------------------------------------------
>
>                 Key: HADOOP-817
>                 URL: https://issues.apache.org/jira/browse/HADOOP-817
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: Sanjay Dahiya
>         Assigned To: Sanjay Dahiya
>         Attachments: NetbeansProfie.png
>
>
> I am seeing OutOfMemoryError for moderate size inputs (~70 text files, 20k each ) causing job to fail in streaming. For very small inputs it still succeeds. Looking into details. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira