You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Srinivas Surasani <hi...@gmail.com> on 2013/07/21 07:56:48 UTC

Hadoop streaming job failure

Hi All,

I'm running hadoop streaming job over 100 GB of data on 50 node cluster.
Job succeeds for the small amounts of data. But when running on 100 GB of
data, I get "memory error" and "BrokenPipe " error. I have enough memory on
each node.

Is there a way to increase the memory for python streaming tasks ?

below are sample error logs

cause:java.io.IOException: subprocess still running
R/W/S=32771708/10/0 in:34752=32771708/943 [rec/s] out:0=10/943 [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=root
HADOOP_USER=null
last Hadoop input: |null|
Broken pipe


Any help appreciated.

Thanks,
Srinivas