You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Krishnanand Khambadkone <kk...@yahoo.com> on 2012/04/19 05:00:54 UTC

Not able to run Wikipedia Bayes Example

Hi,  I am trying to run the Mahout sample in this link,

https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html

When I try to run this step,

$MAHOUT_HOME/bin/mahout  wikipediaDataSetCreator  -i wikipedia/chunks -o wikipediainput -c $MAHOUT_HOME/examples/src/test/resources/country.txt


I get the following exception,  I am running it with Mahout distribution from Cloudera (mahout-0.5-cdh3u3)

12/04/17 18:59:13 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000000_0, Status : FAILED
attempt_201204171311_0005_m_000000_0: 2012-04-17 18:59:09.221 java[4156:1d03] Unable to load realm info from SCDynamicStore
12/04/17 18:59:13 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000001_0, Status : FAILED
java.lang.ArrayStoreException: [C
at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
at org.apache.mahout.analysis.WikipediaAnalyzer.<init>(WikipediaAnalyzer.java:38)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.setup(WikipediaDatasetCreatorMapper.java:107)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.do
attempt_201204171311_0005_m_000001_0: 2012-04-17 18:59:09.208 java[4160:1d03] Unable to load realm info from SCDynamicStore
12/04/17 18:59:18 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000000_1, Status : FAILED
java.lang.ArrayStoreException: [C
at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
at org.apache.mahout.analysis.WikipediaAnalyzer.<init>(WikipediaAnalyzer.java:38)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.setup(WikipediaDatasetCreatorMapper.java:107)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.do

Re: Not able to run Wikipedia Bayes Example

Posted by Krishnanand Khambadkone <kk...@yahoo.com>.
It is running on my pseudo-distributed cluster running on my laptop.



________________________________
 From: Lance Norskog <go...@gmail.com>
To: user@mahout.apache.org; Krishnanand Khambadkone <kk...@yahoo.com> 
Sent: Wednesday, April 18, 2012 8:09 PM
Subject: Re: Not able to run Wikipedia Bayes Example
 
Is this on a Hadoop cluster, or running in pseudo-distributed mode (no
cluster)?

On Wed, Apr 18, 2012 at 8:00 PM, Krishnanand Khambadkone
<kk...@yahoo.com> wrote:
> Hi,  I am trying to run the Mahout sample in this link,
>
> https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
>
> When I try to run this step,
>
> $MAHOUT_HOME/bin/mahout  wikipediaDataSetCreator  -i wikipedia/chunks -o wikipediainput -c $MAHOUT_HOME/examples/src/test/resources/country.txt
>
>
> I get the following exception,  I am running it with Mahout distribution from Cloudera (mahout-0.5-cdh3u3)
>
> 12/04/17 18:59:13 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000000_0, Status : FAILED
> attempt_201204171311_0005_m_000000_0: 2012-04-17 18:59:09.221 java[4156:1d03] Unable to load realm info from SCDynamicStore
> 12/04/17 18:59:13 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000001_0, Status : FAILED
> java.lang.ArrayStoreException: [C
> at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
> at org.apache.mahout.analysis.WikipediaAnalyzer.<init>(WikipediaAnalyzer.java:38)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at java.lang.Class.newInstance0(Class.java:355)
> at java.lang.Class.newInstance(Class.java:308)
> at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.setup(WikipediaDatasetCreatorMapper.java:107)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.do
> attempt_201204171311_0005_m_000001_0: 2012-04-17 18:59:09.208 java[4160:1d03] Unable to load realm info from SCDynamicStore
> 12/04/17 18:59:18 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000000_1, Status : FAILED
> java.lang.ArrayStoreException: [C
> at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
> at org.apache.mahout.analysis.WikipediaAnalyzer.<init>(WikipediaAnalyzer.java:38)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at java.lang.Class.newInstance0(Class.java:355)
> at java.lang.Class.newInstance(Class.java:308)
> at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.setup(WikipediaDatasetCreatorMapper.java:107)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.do



-- 
Lance Norskog
goksron@gmail.com

Re: Not able to run Wikipedia Bayes Example

Posted by Krishnanand Khambadkone <kk...@yahoo.com>.
Lance,  Will this sample run only on a true cluster?  I am running it on a pseudo-distributed cluster (cloudera u3)  on my laptop.



________________________________
 From: Lance Norskog <go...@gmail.com>
To: user@mahout.apache.org; Krishnanand Khambadkone <kk...@yahoo.com> 
Sent: Wednesday, April 18, 2012 8:09 PM
Subject: Re: Not able to run Wikipedia Bayes Example
 
Is this on a Hadoop cluster, or running in pseudo-distributed mode (no
cluster)?

On Wed, Apr 18, 2012 at 8:00 PM, Krishnanand Khambadkone
<kk...@yahoo.com> wrote:
> Hi,  I am trying to run the Mahout sample in this link,
>
> https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
>
> When I try to run this step,
>
> $MAHOUT_HOME/bin/mahout  wikipediaDataSetCreator  -i wikipedia/chunks -o wikipediainput -c $MAHOUT_HOME/examples/src/test/resources/country.txt
>
>
> I get the following exception,  I am running it with Mahout distribution from Cloudera (mahout-0.5-cdh3u3)
>
> 12/04/17 18:59:13 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000000_0, Status : FAILED
> attempt_201204171311_0005_m_000000_0: 2012-04-17 18:59:09.221 java[4156:1d03] Unable to load realm info from SCDynamicStore
> 12/04/17 18:59:13 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000001_0, Status : FAILED
> java.lang.ArrayStoreException: [C
> at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
> at org.apache.mahout.analysis.WikipediaAnalyzer.<init>(WikipediaAnalyzer.java:38)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at java.lang.Class.newInstance0(Class.java:355)
> at java.lang.Class.newInstance(Class.java:308)
> at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.setup(WikipediaDatasetCreatorMapper.java:107)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.do
> attempt_201204171311_0005_m_000001_0: 2012-04-17 18:59:09.208 java[4160:1d03] Unable to load realm info from SCDynamicStore
> 12/04/17 18:59:18 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000000_1, Status : FAILED
> java.lang.ArrayStoreException: [C
> at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
> at org.apache.mahout.analysis.WikipediaAnalyzer.<init>(WikipediaAnalyzer.java:38)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at java.lang.Class.newInstance0(Class.java:355)
> at java.lang.Class.newInstance(Class.java:308)
> at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.setup(WikipediaDatasetCreatorMapper.java:107)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.do



-- 
Lance Norskog
goksron@gmail.com

Re: Fw: Not able to run Wikipedia Bayes Example

Posted by Lance Norskog <go...@gmail.com>.
Hi-

The bin/mahout program uses its own Java option: MAHOUT_OPTS.
http://www.lucidimagination.com/search/?q=mahout_opts

Accd. to this search, it is not on described on the wiki. It is in the
bin/mahout script.

Lance

On Fri, Apr 20, 2012 at 5:48 PM, Krishnanand Khambadkone
<kk...@yahoo.com> wrote:
> Lance,  I was able to advance further after installing and building the
> mahout-trunk.   However when I try to run the last step,
>
> $MAHOUT_HOME/bin/mahout testclassifier -m wikipediamodel -d wikipediainput
>
> I get this error.   My HADOOP_HEAPSIZE is set to 2000MB.  I have 8GB on my
> mac book pro.
>
>
> Exception in thread "Thread for syncLogs" java.lang.OutOfMemoryError: Java
> heap space
>
> ________________________________
> From: Lance Norskog <go...@gmail.com>
>
> To: Krishnanand Khambadkone <kk...@yahoo.com>
> Sent: Thursday, April 19, 2012 6:54 PM
> Subject: Re: Fw: Not able to run Wikipedia Bayes Example
>
> Yikes! You're running Mahout 0.5. This is an old release. We generally
> suggest that you upgrade to the Mahout trunk. When you run this
> program in pseudo-distributed mode, the Cloudera Hadoop code will not
> be used- all Hadoop code comes from the Mahout project.
>
> Lance
>
> On Wed, Apr 18, 2012 at 10:51 PM, Krishnanand Khambadkone
> <kk...@yahoo.com> wrote:
>> Lance,  Will this sample run only on a true cluster?  I am running it on a
>> pseudo-distributed cluster (cloudera u3)  on my laptop.
>>
>> ----- Forwarded Message -----
>> From: Lance Norskog <go...@gmail.com>
>> To: user@mahout.apache.org; Krishnanand Khambadkone
>> <kk...@yahoo.com>
>> Sent: Wednesday, April 18, 2012 8:09 PM
>> Subject: Re: Not able to run Wikipedia Bayes Example
>>
>> Is this on a Hadoop cluster, or running in pseudo-distributed mode (no
>> cluster)?
>>
>> On Wed, Apr 18, 2012 at 8:00 PM, Krishnanand Khambadkone
>> <kk...@yahoo.com> wrote:
>>> Hi,  I am trying to run the Mahout sample in this link,
>>>
>>> https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
>>>
>>> When I try to run this step,
>>>
>>> $MAHOUT_HOME/bin/mahout  wikipediaDataSetCreator  -i wikipedia/chunks -o
>>> wikipediainput -c $MAHOUT_HOME/examples/src/test/resources/country.txt
>>>
>>>
>>> I get the following exception,  I am running it with Mahout distribution
>>> from Cloudera (mahout-0.5-cdh3u3)
>>>
>>> 12/04/17 18:59:13 INFO mapred.JobClient: Task Id :
>>> attempt_201204171311_0005_m_000000_0, Status : FAILED
>>> attempt_201204171311_0005_m_000000_0: 2012-04-17 18:59:09.221
>>> java[4156:1d03] Unable to load realm info from SCDynamicStore
>>> 12/04/17 18:59:13 INFO mapred.JobClient: Task Id :
>>> attempt_201204171311_0005_m_000001_0, Status : FAILED
>>> java.lang.ArrayStoreException: [C
>>> at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
>>> at
>>>
>>> org.apache.mahout.analysis.WikipediaAnalyzer.<init>(WikipediaAnalyzer.java:38)
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at
>>>
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>> at
>>>
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>> at java.lang.Class.newInstance0(Class.java:355)
>>> at java.lang.Class.newInstance(Class.java:308)
>>> at
>>>
>>> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.setup(WikipediaDatasetCreatorMapper.java:107)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>>> at java.security.AccessController.do
>>> attempt_201204171311_0005_m_000001_0: 2012-04-17 18:59:09.208
>>> java[4160:1d03] Unable to load realm info from SCDynamicStore
>>> 12/04/17 18:59:18 INFO mapred.JobClient: Task Id :
>>> attempt_201204171311_0005_m_000000_1, Status : FAILED
>>> java.lang.ArrayStoreException: [C
>>> at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
>>> at
>>>
>>> org.apache.mahout.analysis.WikipediaAnalyzer.<init>(WikipediaAnalyzer.java:38)
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at
>>>
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>> at
>>>
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>> at java.lang.Class.newInstance0(Class.java:355)
>>> at java.lang.Class.newInstance(Class.java:308)
>>> at
>>>
>>> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.setup(WikipediaDatasetCreatorMapper.java:107)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>>> at java.security.AccessController.do
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Not able to run Wikipedia Bayes Example

Posted by Lance Norskog <go...@gmail.com>.
Is this on a Hadoop cluster, or running in pseudo-distributed mode (no
cluster)?

On Wed, Apr 18, 2012 at 8:00 PM, Krishnanand Khambadkone
<kk...@yahoo.com> wrote:
> Hi,  I am trying to run the Mahout sample in this link,
>
> https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
>
> When I try to run this step,
>
> $MAHOUT_HOME/bin/mahout  wikipediaDataSetCreator  -i wikipedia/chunks -o wikipediainput -c $MAHOUT_HOME/examples/src/test/resources/country.txt
>
>
> I get the following exception,  I am running it with Mahout distribution from Cloudera (mahout-0.5-cdh3u3)
>
> 12/04/17 18:59:13 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000000_0, Status : FAILED
> attempt_201204171311_0005_m_000000_0: 2012-04-17 18:59:09.221 java[4156:1d03] Unable to load realm info from SCDynamicStore
> 12/04/17 18:59:13 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000001_0, Status : FAILED
> java.lang.ArrayStoreException: [C
> at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
> at org.apache.mahout.analysis.WikipediaAnalyzer.<init>(WikipediaAnalyzer.java:38)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at java.lang.Class.newInstance0(Class.java:355)
> at java.lang.Class.newInstance(Class.java:308)
> at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.setup(WikipediaDatasetCreatorMapper.java:107)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.do
> attempt_201204171311_0005_m_000001_0: 2012-04-17 18:59:09.208 java[4160:1d03] Unable to load realm info from SCDynamicStore
> 12/04/17 18:59:18 INFO mapred.JobClient: Task Id : attempt_201204171311_0005_m_000000_1, Status : FAILED
> java.lang.ArrayStoreException: [C
> at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
> at org.apache.mahout.analysis.WikipediaAnalyzer.<init>(WikipediaAnalyzer.java:38)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at java.lang.Class.newInstance0(Class.java:355)
> at java.lang.Class.newInstance(Class.java:308)
> at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.setup(WikipediaDatasetCreatorMapper.java:107)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.do



-- 
Lance Norskog
goksron@gmail.com