You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Jia Rao <ri...@gmail.com> on 2011/01/25 03:54:45 UTC

Problem running twenty newsgroup example in a hadoop cluster

Hi all,

I am having a problem running the 20 newsgroup example in a hadoop cluster.
The trainclassifier worked fine but I got "out of memory java heap" problem
in the testclassifier.

The following is the configuration of the hadoop cluster.

Physical machines: 4 nodes, each with 6GB memory.

Hadoop: 0.20.2, HADOOP_HEAP_SIZE=3200 in hadoop-env.sh,
mapred.child.java.opts=-Xmx1024M in mapred-site.xml.

mahout: tried release 0.4 and the latest source, same problem.

Command line arguments used:

$MAHOUT_HOME/bin/mahout testclassifier \
  -m newsmodel \
  -d 20news-input \
  -type bayes \
  -ng 3 \
  -source hdfs \
  -method mapreduce


Any ideas ?
Thanks !

Re: Problem running twenty newsgroup example in a hadoop cluster

Posted by james q <ja...@gmail.com>.
Hey,

Did you ever figure this issue out?

>From my experience with Hadoop, you can optimize memory usage in your
cluster. From
http://getsatisfaction.com/cloudera/topics/how_much_ram_datanode_should_take,
HADOOP_HEAP_SIZE sets the size of the hadoop daemons (datanode,
tasktracker) and mapred.child.java.opts helps controls the heap size of
children JVMs (the map and reduce tasks themselves).

So maybe you could set HADOOP_HEAD_SIZE to 1Gb and the
mapred.child.java.opts=-Xmx3072M (3Gb). That way your map tasks have more
memory to work with?

> -- james


On Mon, Jan 24, 2011 at 9:54 PM, Jia Rao <ri...@gmail.com> wrote:

> Hi all,
>
> I am having a problem running the 20 newsgroup example in a hadoop cluster.
> The trainclassifier worked fine but I got "out of memory java heap" problem
> in the testclassifier.
>
> The following is the configuration of the hadoop cluster.
>
> Physical machines: 4 nodes, each with 6GB memory.
>
> Hadoop: 0.20.2, HADOOP_HEAP_SIZE=3200 in hadoop-env.sh,
> mapred.child.java.opts=-Xmx1024M in mapred-site.xml.
>
> mahout: tried release 0.4 and the latest source, same problem.
>
> Command line arguments used:
>
> $MAHOUT_HOME/bin/mahout testclassifier \
>  -m newsmodel \
>  -d 20news-input \
>  -type bayes \
>  -ng 3 \
>  -source hdfs \
>  -method mapreduce
>
>
> Any ideas ?
> Thanks !
>

Re: Problem running twenty newsgroup example in a hadoop cluster

Posted by Jia Rao <ri...@gmail.com>.
Sorry, 3200m

On Tue, Jan 25, 2011 at 12:40 AM, Ted Dunning <te...@gmail.com> wrote:

> 3200?
>
> or 3200m?
>
>
> On Mon, Jan 24, 2011 at 6:54 PM, Jia Rao <ri...@gmail.com> wrote:
>
> > Hi all,
> >
> > I am having a problem running the 20 newsgroup example in a hadoop
> cluster.
> > The trainclassifier worked fine but I got "out of memory java heap"
> problem
> > in the testclassifier.
> >
> > The following is the configuration of the hadoop cluster.
> >
> > Physical machines: 4 nodes, each with 6GB memory.
> >
> > Hadoop: 0.20.2, HADOOP_HEAP_SIZE=3200 in hadoop-env.sh,
> > mapred.child.java.opts=-Xmx1024M in mapred-site.xml.
> >
> > mahout: tried release 0.4 and the latest source, same problem.
> >
> > Command line arguments used:
> >
> > $MAHOUT_HOME/bin/mahout testclassifier \
> >  -m newsmodel \
> >  -d 20news-input \
> >  -type bayes \
> >  -ng 3 \
> >  -source hdfs \
> >  -method mapreduce
> >
> >
> > Any ideas ?
> > Thanks !
> >
>

Re: Problem running twenty newsgroup example in a hadoop cluster

Posted by Ted Dunning <te...@gmail.com>.
3200?

or 3200m?


On Mon, Jan 24, 2011 at 6:54 PM, Jia Rao <ri...@gmail.com> wrote:

> Hi all,
>
> I am having a problem running the 20 newsgroup example in a hadoop cluster.
> The trainclassifier worked fine but I got "out of memory java heap" problem
> in the testclassifier.
>
> The following is the configuration of the hadoop cluster.
>
> Physical machines: 4 nodes, each with 6GB memory.
>
> Hadoop: 0.20.2, HADOOP_HEAP_SIZE=3200 in hadoop-env.sh,
> mapred.child.java.opts=-Xmx1024M in mapred-site.xml.
>
> mahout: tried release 0.4 and the latest source, same problem.
>
> Command line arguments used:
>
> $MAHOUT_HOME/bin/mahout testclassifier \
>  -m newsmodel \
>  -d 20news-input \
>  -type bayes \
>  -ng 3 \
>  -source hdfs \
>  -method mapreduce
>
>
> Any ideas ?
> Thanks !
>