You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Albert Strasheim (JIRA)" <ji...@apache.org> on 2007/01/09 20:48:27 UTC
[jira] Commented: (HADOOP-817) Streaming reducers throw OutOfMemory
for not so large inputs
[ https://issues.apache.org/jira/browse/HADOOP-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463379 ]
Albert Strasheim commented on HADOOP-817:
-----------------------------------------
I'm running 0.10.0 which should include the patch from HADOOP-849 (as far as I can tell), but I'm still running into OutOfMemoryErrors.
I'm following the example from this blog entry:
http://jjinux.blogspot.com/2007/01/clustering-hadoop.html
hadoop-default.xml contains:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/state/partition1/tmp/hadoop-${user.name}</value>
</property>
<property>
<name>fs.default.name</name>
<value>dominatrix.local:54310</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>dominatrix.local:54311</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
mapred settings are unchanged.
I'm using an input file generated as follows:
perl -e 'for $i(1..99999999) { print "$i\t\n"; }' > input.txt
This generates a 945 MB input file. I then run:
hadoop-0.10.0/bin/hadoop jar hadoop-0.10.0/contrib/hadoop-streaming.jar -mapper mapper.py -reducer reducer.py -input input.txt -output out-dir
I am running the job across 21 nodes.
What next? How do I debug this problem further? I'll try Java 6 in the mean time.
> Streaming reducers throw OutOfMemory for not so large inputs
> ------------------------------------------------------------
>
> Key: HADOOP-817
> URL: https://issues.apache.org/jira/browse/HADOOP-817
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Reporter: Sanjay Dahiya
> Assigned To: Sanjay Dahiya
> Attachments: NetbeansProfie.png
>
>
> I am seeing OutOfMemoryError for moderate size inputs (~70 text files, 20k each ) causing job to fail in streaming. For very small inputs it still succeeds. Looking into details.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira