You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Sears Merritt <se...@gmail.com> on 2012/08/03 22:56:44 UTC

mahout and hadoop configuration question

Hi All,

I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster (Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the mahout job connects to HDFS for reading/writing data but only runs hadoop on a single machine, not the entire cluster. To the best of my knowledge I have all the environment variables configured properly, as you will see from the output below.

When I launch the job using the command line tools as follows:

bin/mahout kmeans -i /users/merritts/rvs -o /users/merritts/kmeans_output -c /users/merritts/clusters -k 100 -x 10

I get the following output:

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB: /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments: {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10], --method=[mapreduce], --numClusters=[10000], --output=[/users/merritts/kmeans_output], --startPhase=[0], --tempDir=[temp]}
12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor

Has anyone run into this before? If so, how did you fix the issue?

Thanks for your time,
Sears Merritt

Re: mahout and hadoop configuration question

Posted by Sears Merritt <se...@gmail.com>.

Aha! Thanks for catching that.

On Aug 3, 2012, at 3:13 PM, Sean Owen <sr...@gmail.com> wrote:

> Ah that's the ticket. The stack trace shows it is failing in the driver
> program, which runs client-side. It's not getting to launch a job.
> 
> It looks like it's running out of memory creating a new dense vector in the
> random seed generator process. I don't know anything more than that about
> why it happens, whether your input is funny, etc. but that is why it is not
> getting to Hadoop.
> 
> On Fri, Aug 3, 2012 at 5:04 PM, Sears Merritt <se...@gmail.com>wrote:
> 
>> Exactly. There isn't an error. The job just runs on a single machine and
>> eventually crashes when it exhausts the JVM's memory. I never see it show
>> up in the job tracker and never get any map-reduce status output. The full
>> output is here:
>> 
>> -bash-4.1$ bin/mahout kmeans -i /users/merritts/rvs -o
>> /users/merritts/kmeans_output -c /users/merritts/clusters -k 10000 -x 10
>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
>> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
>> MAHOUT-JOB:
>> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
>> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
>> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
>> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
>> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
>> --method=[mapreduce], --numClusters=[10000],
>> --output=[/users/merritts/kmeans_output], --startPhase=[0],
>> --tempDir=[temp]}
>> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
>> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>        at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:54)
>>        at org.apache.mahout.math.DenseVector.like(DenseVector.java:115)
>>        at org.apache.mahout.math.DenseVector.like(DenseVector.java:28)
>>        at
>> org.apache.mahout.math.AbstractVector.times(AbstractVector.java:478)
>>        at
>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:273)
>>        at
>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:248)
>>        at
>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:93)
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:94)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:48)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>        at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>        at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>> 
>> 
>> 
>> 
>> On Aug 3, 2012, at 3:00 PM, Sean Owen <sr...@gmail.com> wrote:
>> 
>>> I don't see an error here...? the warning is an ignorable message from
>>> hadoop.
>>> 
>>> On Fri, Aug 3, 2012 at 4:56 PM, Sears Merritt <sears.merritt@gmail.com
>>> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster
>>>> (Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the
>>>> mahout job connects to HDFS for reading/writing data but only runs
>> hadoop
>>>> on a single machine, not the entire cluster. To the best of my
>> knowledge I
>>>> have all the environment variables configured properly, as you will see
>>>> from the output below.
>>>> 
>>>> When I launch the job using the command line tools as follows:
>>>> 
>>>> bin/mahout kmeans -i /users/merritts/rvs -o
>> /users/merritts/kmeans_output
>>>> -c /users/merritts/clusters -k 100 -x 10
>>>> 
>>>> I get the following output:
>>>> 
>>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>>>> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
>>>> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
>>>> MAHOUT-JOB:
>>>> 
>> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
>>>> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
>>>> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
>>>> 
>> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
>>>> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
>>>> --method=[mapreduce], --numClusters=[10000],
>>>> --output=[/users/merritts/kmeans_output], --startPhase=[0],
>>>> --tempDir=[temp]}
>>>> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting
>> /users/merritts/clusters
>>>> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
>>>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>>>> 
>>>> Has anyone run into this before? If so, how did you fix the issue?
>>>> 
>>>> Thanks for your time,
>>>> Sears Merritt
>> 
>>

Re: mahout and hadoop configuration question

Posted by Sean Owen <sr...@gmail.com>.

Ah that's the ticket. The stack trace shows it is failing in the driver
program, which runs client-side. It's not getting to launch a job.

It looks like it's running out of memory creating a new dense vector in the
random seed generator process. I don't know anything more than that about
why it happens, whether your input is funny, etc. but that is why it is not
getting to Hadoop.

On Fri, Aug 3, 2012 at 5:04 PM, Sears Merritt <se...@gmail.com>wrote:

> Exactly. There isn't an error. The job just runs on a single machine and
> eventually crashes when it exhausts the JVM's memory. I never see it show
> up in the job tracker and never get any map-reduce status output. The full
> output is here:
>
> -bash-4.1$ bin/mahout kmeans -i /users/merritts/rvs -o
> /users/merritts/kmeans_output -c /users/merritts/clusters -k 10000 -x 10
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
> MAHOUT-JOB:
> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
> --method=[mapreduce], --numClusters=[10000],
> --output=[/users/merritts/kmeans_output], --startPhase=[0],
> --tempDir=[temp]}
> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>         at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:54)
>         at org.apache.mahout.math.DenseVector.like(DenseVector.java:115)
>         at org.apache.mahout.math.DenseVector.like(DenseVector.java:28)
>         at
> org.apache.mahout.math.AbstractVector.times(AbstractVector.java:478)
>         at
> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:273)
>         at
> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:248)
>         at
> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:93)
>         at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:94)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:48)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>
>
>
>
> On Aug 3, 2012, at 3:00 PM, Sean Owen <sr...@gmail.com> wrote:
>
> > I don't see an error here...? the warning is an ignorable message from
> > hadoop.
> >
> > On Fri, Aug 3, 2012 at 4:56 PM, Sears Merritt <sears.merritt@gmail.com
> >wrote:
> >
> >> Hi All,
> >>
> >> I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster
> >> (Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the
> >> mahout job connects to HDFS for reading/writing data but only runs
> hadoop
> >> on a single machine, not the entire cluster. To the best of my
> knowledge I
> >> have all the environment variables configured properly, as you will see
> >> from the output below.
> >>
> >> When I launch the job using the command line tools as follows:
> >>
> >> bin/mahout kmeans -i /users/merritts/rvs -o
> /users/merritts/kmeans_output
> >> -c /users/merritts/clusters -k 100 -x 10
> >>
> >> I get the following output:
> >>
> >> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> >> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
> >> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
> >> MAHOUT-JOB:
> >>
> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> >> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
> >> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
> >>
> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> >> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
> >> --method=[mapreduce], --numClusters=[10000],
> >> --output=[/users/merritts/kmeans_output], --startPhase=[0],
> >> --tempDir=[temp]}
> >> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting
> /users/merritts/clusters
> >> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load
> native-hadoop
> >> library for your platform... using builtin-java classes where applicable
> >> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
> >> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> >>
> >> Has anyone run into this before? If so, how did you fix the issue?
> >>
> >> Thanks for your time,
> >> Sears Merritt
>
>

Re: mahout and hadoop configuration question

Posted by Paritosh Ranjan <pr...@xebia.com>.

-k 10000

Try reducing the number of initial clusters, or, use canopy clustering to find out the initial clusters.
Currently, the initial number of clusters is too high, which might be resulting into OOM.


On 04-08-2012 02:34, Sears Merritt wrote:
> Exactly. There isn't an error. The job just runs on a single machine and eventually crashes when it exhausts the JVM's memory. I never see it show up in the job tracker and never get any map-reduce status output. The full output is here:
>
> -bash-4.1$ bin/mahout kmeans -i /users/merritts/rvs -o /users/merritts/kmeans_output -c /users/merritts/clusters -k 10000 -x 10
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/lib/hadoop/conf
> MAHOUT-JOB: /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments: {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10], --method=[mapreduce], --numClusters=[10000], --output=[/users/merritts/kmeans_output], --startPhase=[0], --tempDir=[temp]}
> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> 	at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:54)
> 	at org.apache.mahout.math.DenseVector.like(DenseVector.java:115)
> 	at org.apache.mahout.math.DenseVector.like(DenseVector.java:28)
> 	at org.apache.mahout.math.AbstractVector.times(AbstractVector.java:478)
> 	at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:273)
> 	at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:248)
> 	at org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:93)
> 	at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:94)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:48)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>
>
>   
>
> On Aug 3, 2012, at 3:00 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> I don't see an error here...? the warning is an ignorable message from
>> hadoop.
>>
>> On Fri, Aug 3, 2012 at 4:56 PM, Sears Merritt <se...@gmail.com>wrote:
>>
>>> Hi All,
>>>
>>> I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster
>>> (Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the
>>> mahout job connects to HDFS for reading/writing data but only runs hadoop
>>> on a single machine, not the entire cluster. To the best of my knowledge I
>>> have all the environment variables configured properly, as you will see
>>> from the output below.
>>>
>>> When I launch the job using the command line tools as follows:
>>>
>>> bin/mahout kmeans -i /users/merritts/rvs -o /users/merritts/kmeans_output
>>> -c /users/merritts/clusters -k 100 -x 10
>>>
>>> I get the following output:
>>>
>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>>> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
>>> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
>>> MAHOUT-JOB:
>>> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
>>> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
>>> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
>>> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
>>> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
>>> --method=[mapreduce], --numClusters=[10000],
>>> --output=[/users/merritts/kmeans_output], --startPhase=[0],
>>> --tempDir=[temp]}
>>> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
>>> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
>>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>>>
>>> Has anyone run into this before? If so, how did you fix the issue?
>>>
>>> Thanks for your time,
>>> Sears Merritt

Re: mahout and hadoop configuration question

Posted by Sears Merritt <se...@gmail.com>.

Exactly. There isn't an error. The job just runs on a single machine and eventually crashes when it exhausts the JVM's memory. I never see it show up in the job tracker and never get any map-reduce status output. The full output is here:

-bash-4.1$ bin/mahout kmeans -i /users/merritts/rvs -o /users/merritts/kmeans_output -c /users/merritts/clusters -k 10000 -x 10
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB: /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments: {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10], --method=[mapreduce], --numClusters=[10000], --output=[/users/merritts/kmeans_output], --startPhase=[0], --tempDir=[temp]}
12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:54)
	at org.apache.mahout.math.DenseVector.like(DenseVector.java:115)
	at org.apache.mahout.math.DenseVector.like(DenseVector.java:28)
	at org.apache.mahout.math.AbstractVector.times(AbstractVector.java:478)
	at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:273)
	at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:248)
	at org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:93)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:94)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:48)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:197)


 

On Aug 3, 2012, at 3:00 PM, Sean Owen <sr...@gmail.com> wrote:

> I don't see an error here...? the warning is an ignorable message from
> hadoop.
> 
> On Fri, Aug 3, 2012 at 4:56 PM, Sears Merritt <se...@gmail.com>wrote:
> 
>> Hi All,
>> 
>> I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster
>> (Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the
>> mahout job connects to HDFS for reading/writing data but only runs hadoop
>> on a single machine, not the entire cluster. To the best of my knowledge I
>> have all the environment variables configured properly, as you will see
>> from the output below.
>> 
>> When I launch the job using the command line tools as follows:
>> 
>> bin/mahout kmeans -i /users/merritts/rvs -o /users/merritts/kmeans_output
>> -c /users/merritts/clusters -k 100 -x 10
>> 
>> I get the following output:
>> 
>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
>> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
>> MAHOUT-JOB:
>> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
>> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
>> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
>> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
>> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
>> --method=[mapreduce], --numClusters=[10000],
>> --output=[/users/merritts/kmeans_output], --startPhase=[0],
>> --tempDir=[temp]}
>> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
>> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>> 
>> Has anyone run into this before? If so, how did you fix the issue?
>> 
>> Thanks for your time,
>> Sears Merritt

Re: mahout and hadoop configuration question

Posted by Sean Owen <sr...@gmail.com>.

I don't see an error here...? the warning is an ignorable message from
hadoop.

On Fri, Aug 3, 2012 at 4:56 PM, Sears Merritt <se...@gmail.com>wrote:

> Hi All,
>
> I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster
> (Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the
> mahout job connects to HDFS for reading/writing data but only runs hadoop
> on a single machine, not the entire cluster. To the best of my knowledge I
> have all the environment variables configured properly, as you will see
> from the output below.
>
> When I launch the job using the command line tools as follows:
>
> bin/mahout kmeans -i /users/merritts/rvs -o /users/merritts/kmeans_output
> -c /users/merritts/clusters -k 100 -x 10
>
> I get the following output:
>
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
> MAHOUT-JOB:
> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
> --method=[mapreduce], --numClusters=[10000],
> --output=[/users/merritts/kmeans_output], --startPhase=[0],
> --tempDir=[temp]}
> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>
> Has anyone run into this before? If so, how did you fix the issue?
>
> Thanks for your time,
> Sears Merritt