You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Mike Roberts <su...@gmail.com> on 2010/05/19 01:27:50 UTC

Error Running Frequent Itemset Mining Example

Hey Guys,

Just trying to get the example mentioned here working:
https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html.

I downloaded the accidents.dat file and placed it in
/home/ubuntu/mahout-in/fpm-input.
I created a directory for the output as /home/ubuntu/mahout-in/fpm-out.
Then, I ran the following command:
./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
/home/ubuntu/mahout-in/fpm-out --method mapreduce

It runs for a bit and after the first step I get the following error:

java.io.IOException: java.lang.ClassNotFoundException:
org.apache.mahout.common.Pair
        at
org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
        at
org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
        at
org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
        at
org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84)
        at
org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)


The step that it was running:
10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized
10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to process :
1
10/05/18 23:10:19 INFO mapred.JobClient: Running job: job_local_0002
10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to process :
1
10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
10/05/18 23:10:19 INFO mapred.MapTask: data buffer = 79691776/99614720
10/05/18 23:10:19 INFO mapred.MapTask: record buffer = 262144/327680
10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002

Anyone know what's going on here, or have a solution?  I verified that the
class file (Pair.Java) exists in
/trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn install in
core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on EC2.  BTW,
if it's not obvious, I'm a total Mahout n00b.

Thanks,

Mike

Re: Error Running Frequent Itemset Mining Example

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

Its always nice to get the same answer more than once. Now to see if I 
can make it use more than a single mapper and reducer. It clearly won't 
scale without this.

On the Hadoop quirk, I've noticed that the datanodes can take up to a 
minute to come on line after start-all. During that time I also see the 
replication 0 error on puts. I suspect that the extra time taken to type 
an extra -ls or open a browser on 50070 might be all it takes to get the 
data nodes fully running and connected.

On 5/21/10 2:16 PM, Mike Roberts wrote:
> It's ALIVE!!! And I got the same count: 359.  Recap: Setting
> mapred.child.java.opts in mapred-site.xml was the thing that worked. So,
> shakka-kahn!
>
> Found this exciting Hadoop quirk:
> The following lines run in sequence produce an error:
> $HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
> $HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input
> -->  Error= ..."could only be replicated to 0 nodes instead of 1 hadoop"...
>
> I found this page:
> http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment  which
> suggested a high level workaroud of loading a status page.
>
> I found a simple solution that worked:
> $HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
> $HADOOP_HOME/bin/hadoop dfs -ls /fpm-input<-- this somehow makes hadoop
> stop complaining.
> $HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input
>
>

Re: Error Running Frequent Itemset Mining Example

Posted by Mike Roberts <su...@gmail.com>.

It's ALIVE!!! And I got the same count: 359.  Recap: Setting
mapred.child.java.opts in mapred-site.xml was the thing that worked. So,
shakka-kahn!

Found this exciting Hadoop quirk:
The following lines run in sequence produce an error:
$HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
$HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input
--> Error= ..."could only be replicated to 0 nodes instead of 1 hadoop"...

I found this page:
http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment  which
suggested a high level workaroud of loading a status page.

I found a simple solution that worked:
$HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
$HADOOP_HOME/bin/hadoop dfs -ls /fpm-input  <-- this somehow makes hadoop
stop complaining.
$HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input

On Fri, May 21, 2010 at 11:08 AM, Jeff Eastman
<jd...@windwardsolutions.com>wrote:

> I think the hadoop-config.sh heap size only affects heaps on the Hadoop
> daemons. I had it at 2g earlier when I was getting the OMEs on fpg. I added
> a note to the wiki page about setting mapred.child.java.opts to 2g and also
> to remove the other config values that were set for single-node operation
> (esp dfs.replication=1).
>
> I assume you added HADOOP_CONF_DIR so it actually runs in Hadoop?
>
>
>
> On 5/21/10 10:52 AM, Mike Roberts wrote:
>
>> Exciting.  Yeah, I also set my heapsize to 2G.  I set it in the
>> hadoop-config.sh file.  Did you do it there or did you instead set it in
>> /conf/madred-site.xml -->    mapred.child.java.opts?  That'd be my next
>> step
>> if I were actually getting memory errors, but wasn't even sure that real
>> data could be produced.
>>
>> Kinda scary that it'll exit successfully without results.  Does mahout
>> ever
>> return "wrong" results?  That is, there should be 120,000 results, but
>> because of some memory config somewhere it successfully returns just
>> 100,000
>> results?  Anyone ever see that, and if so, how do you deal with it?
>>   conf/mapred-site.xml mapred.child.java.opts  conf/mapred-site.xml
>> mapred.child.java.opts
>>
>> On Fri, May 21, 2010 at 10:36 AM, Jeff Eastman
>> <jd...@windwardsolutions.com>wrote:
>>
>>
>>
>>> On 5/20/10 9:51 PM, Mike Roberts wrote:
>>>
>>>
>>>
>>>> ./bin/mahout seqdumper --seqFile patterns/fpgrowth/part-r-00000
>>>>
>>>>
>>>>
>>> After reconfiguring a 4-node cluster to set the java heapsize to 2g I got
>>> 92144 in patterns/fpgrowth/part-r-00000 and got Count: 359 and volumes of
>>> output after seqdumper. But its only using a single mapper/reducer in all
>>> the steps (probably why it OMEs with the default heap). I also tried
>>> Drew's
>>> -Dmapred.reduce.tasks=2 trick but bin/mahout barfs on that.
>>>
>>>
>>>
>>
>>
>
>

Re: Error Running Frequent Itemset Mining Example

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

I think the hadoop-config.sh heap size only affects heaps on the Hadoop 
daemons. I had it at 2g earlier when I was getting the OMEs on fpg. I 
added a note to the wiki page about setting mapred.child.java.opts to 2g 
and also to remove the other config values that were set for single-node 
operation (esp dfs.replication=1).

I assume you added HADOOP_CONF_DIR so it actually runs in Hadoop?


On 5/21/10 10:52 AM, Mike Roberts wrote:
> Exciting.  Yeah, I also set my heapsize to 2G.  I set it in the
> hadoop-config.sh file.  Did you do it there or did you instead set it in
> /conf/madred-site.xml -->    mapred.child.java.opts?  That'd be my next step
> if I were actually getting memory errors, but wasn't even sure that real
> data could be produced.
>
> Kinda scary that it'll exit successfully without results.  Does mahout ever
> return "wrong" results?  That is, there should be 120,000 results, but
> because of some memory config somewhere it successfully returns just 100,000
> results?  Anyone ever see that, and if so, how do you deal with it?
>    conf/mapred-site.xml mapred.child.java.opts  conf/mapred-site.xml
> mapred.child.java.opts
>
> On Fri, May 21, 2010 at 10:36 AM, Jeff Eastman
> <jd...@windwardsolutions.com>wrote:
>
>    
>> On 5/20/10 9:51 PM, Mike Roberts wrote:
>>
>>      
>>> ./bin/mahout seqdumper --seqFile patterns/fpgrowth/part-r-00000
>>>
>>>        
>> After reconfiguring a 4-node cluster to set the java heapsize to 2g I got
>> 92144 in patterns/fpgrowth/part-r-00000 and got Count: 359 and volumes of
>> output after seqdumper. But its only using a single mapper/reducer in all
>> the steps (probably why it OMEs with the default heap). I also tried Drew's
>> -Dmapred.reduce.tasks=2 trick but bin/mahout barfs on that.
>>
>>      
>

Re: Error Running Frequent Itemset Mining Example

Posted by Mike Roberts <su...@gmail.com>.

Exciting.  Yeah, I also set my heapsize to 2G.  I set it in the
hadoop-config.sh file.  Did you do it there or did you instead set it in
/conf/madred-site.xml -->   mapred.child.java.opts?  That'd be my next step
if I were actually getting memory errors, but wasn't even sure that real
data could be produced.

Kinda scary that it'll exit successfully without results.  Does mahout ever
return "wrong" results?  That is, there should be 120,000 results, but
because of some memory config somewhere it successfully returns just 100,000
results?  Anyone ever see that, and if so, how do you deal with it?
  conf/mapred-site.xml mapred.child.java.opts  conf/mapred-site.xml
mapred.child.java.opts

On Fri, May 21, 2010 at 10:36 AM, Jeff Eastman
<jd...@windwardsolutions.com>wrote:

> On 5/20/10 9:51 PM, Mike Roberts wrote:
>
>> ./bin/mahout seqdumper --seqFile patterns/fpgrowth/part-r-00000
>>
> After reconfiguring a 4-node cluster to set the java heapsize to 2g I got
> 92144 in patterns/fpgrowth/part-r-00000 and got Count: 359 and volumes of
> output after seqdumper. But its only using a single mapper/reducer in all
> the steps (probably why it OMEs with the default heap). I also tried Drew's
> -Dmapred.reduce.tasks=2 trick but bin/mahout barfs on that.
>

Re: Error Running Frequent Itemset Mining Example

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

On 5/20/10 9:51 PM, Mike Roberts wrote:
> ./bin/mahout seqdumper --seqFile patterns/fpgrowth/part-r-00000
After reconfiguring a 4-node cluster to set the java heapsize to 2g I 
got 92144 in patterns/fpgrowth/part-r-00000 and got Count: 359 and 
volumes of output after seqdumper. But its only using a single 
mapper/reducer in all the steps (probably why it OMEs with the default 
heap). I also tried Drew's -Dmapred.reduce.tasks=2 trick but bin/mahout 
barfs on that.

Re: Error Running Frequent Itemset Mining Example

Posted by Mike Roberts <su...@gmail.com>.

Okay, got it "working" -- sort of. No errors, but no actual output either.

I installed a new AMI per the wiki article mentioned in this thread.  Worked
like a champ.

Then, I tried to run the sample dataset "accidents.dat" using the fpg
algorithm.  I used the exact command line I found here on Grant Ingersoll's
site:
http://lucene.grantingersoll.com/2010/02/16/trijug-intro-to-mahout-slides-and-demo-examples/

 ./bin/mahout fpg -i /home/mahout-in/fpm-input/accidents.dat -o patterns -k
50 -method mapreduce -g 10 -regex [\ ]

Okay.  So the job "completed" "successfully".  But, the folder that should
contain the final output (./patterns/fpgrowth/part-r-00000) is only *121
bytes total*.

When I run the next command (also copied from the same blog article above):
 ./bin/mahout seqdumper --seqFile patterns/fpgrowth/part-r-00000

and that "sucessfully" runs with a result of :

*Input Path: patterns/fpgrowth/part-r-00000
Key class: class org.apache.hadoop.io.Text Value Class: class
org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns
Count: 0*

So, what's the haps?  Can anybody produce real output using my command line
and data?

Thanks.

P.S. for the purpose of loosely documenting (and preserving for
search) anissue I ran into and solved -- I ran into a Java Heap Out of
Memory error while running this.  Then, that put the Hadoop datanode into
safemode which essentially stopped datanode from running when I did a
stop-all.sh and start-all.sh.  So, the workaround was to blow away and
reformat the datanode, like so:

*$HADOOP_HOME/bin/stop-all.sh
rm -r /usr/local/hadoop-data/*   <--*location set in hdfs-site.xml
*$HADOOP_HOME/bin/hadoop namenode -format
$HADOOP_HOME/bin/hadoop datanode -format
$HADOOP_HOME/bin/start-all.sh
$HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
$HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input*




On Tue, May 18, 2010 at 7:40 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:

> Ya, I put it up last Thursday. Thought it might come in handy :). Of
> course, it runs LDA, k-Means, Dirichlet too, though LDA is taking *forever*
> to run build-reuters.sh with a single mapper. Gotta look into that next...
>
> Jeff
>
>
> On 5/18/10 7:11 PM, Mike Roberts wrote:
>
>> Ah, nice!  That's new -- er very recently update.  Cool.  Thanks.
>>
>> On Tue, May 18, 2010 at 6:50 PM, Jeff Eastman<jdog@windwardsolutions.com
>> >wrote:
>>
>>
>>
>>> I'm running on a Cloudera Ubuntu based AMI that I subsequently configured
>>> as in https://cwiki.apache.org/confluence/display/MAHOUT/MahoutEC2
>>>
>>> Jeff
>>>
>>>
>>>
>>> On 5/18/10 6:37 PM, Mike Roberts wrote:
>>>
>>>
>>>
>>>> Nuts, and I was just about to finish my
>>>>
>>>> *"A Complete Newb’s Guide to (Installing on EC2) and Actually Running
>>>> Mahout
>>>> from the Command Line" *wiki post.
>>>>
>>>> Now, I'll have to see where I went wrong.  Which distro are you running?
>>>>  I
>>>> started with an Alestic Ubuntu 10.4 AMI (ami-cb97c68e).
>>>>
>>>> On Tue, May 18, 2010 at 5:34 PM, Jeff Eastman<
>>>> jdog@windwardsolutions.com
>>>>
>>>>
>>>>> wrote:
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> I also brought up a single instance at
>>>>> http://ec2-184-73-30-93.compute-1.amazonaws.com:50030/jobtracker.jspand
>>>>> that ran fine too. It looks to me like the problem, whatever it is, is
>>>>> in
>>>>> your AMI or its configuration.
>>>>>
>>>>> Jeff
>>>>>
>>>>>
>>>>>
>>>>> On 5/18/10 5:15 PM, Jeff Eastman wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Welll, I just brought up a 2 node cluster at
>>>>>>
>>>>>>
>>>>>> http://ec2-174-129-148-227.compute-1.amazonaws.com:50030/jobtracker.jspanditran fine.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/18/10 4:56 PM, Mike Roberts wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Single instance.  Thx.
>>>>>>>
>>>>>>> On Tue, May 18, 2010 at 4:49 PM, Jeff Eastman<
>>>>>>> jdog@windwardsolutions.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>  Hi Mike,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Shouldn't happen. You running this on a single instance or on a
>>>>>>>> hadoop
>>>>>>>> cluster? I will see if I can duplicate.
>>>>>>>>
>>>>>>>> Jeff
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/18/10 4:27 PM, Mike Roberts wrote:
>>>>>>>>
>>>>>>>>  Hey Guys,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Just trying to get the example mentioned here working:
>>>>>>>>> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>> I downloaded the accidents.dat file and placed it in
>>>>>>>>> /home/ubuntu/mahout-in/fpm-input.
>>>>>>>>> I created a directory for the output as
>>>>>>>>> /home/ubuntu/mahout-in/fpm-out.
>>>>>>>>> Then, I ran the following command:
>>>>>>>>> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
>>>>>>>>> /home/ubuntu/mahout-in/fpm-out --method mapreduce
>>>>>>>>>
>>>>>>>>> It runs for a bit and after the first step I get the following
>>>>>>>>> error:
>>>>>>>>>
>>>>>>>>> java.io.IOException: java.lang.ClassNotFoundException:
>>>>>>>>> org.apache.mahout.common.Pair
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77)
>>>>>>>>>
>>>>>>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The step that it was running:
>>>>>>>>> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
>>>>>>>>> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM
>>>>>>>>> Metrics
>>>>>>>>> with
>>>>>>>>> processName=JobTracker, sessionId= - already initialized
>>>>>>>>> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser
>>>>>>>>> for
>>>>>>>>> parsing the arguments. Applications should implement Tool for the
>>>>>>>>> same.
>>>>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>>>>>> process
>>>>>>>>> :
>>>>>>>>> 1
>>>>>>>>> 10/05/18 23:10:19 INFO mapred.JobClient: Running job:
>>>>>>>>> job_local_0002
>>>>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>>>>>> process
>>>>>>>>> :
>>>>>>>>> 1
>>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
>>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer =
>>>>>>>>> 79691776/99614720
>>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer =
>>>>>>>>> 262144/327680
>>>>>>>>> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002
>>>>>>>>>
>>>>>>>>> Anyone know what's going on here, or have a solution?  I verified
>>>>>>>>> that
>>>>>>>>> the
>>>>>>>>> class file (Pair.Java) exists in
>>>>>>>>> /trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn
>>>>>>>>> install
>>>>>>>>> in
>>>>>>>>> core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on
>>>>>>>>> EC2.
>>>>>>>>>  BTW,
>>>>>>>>> if it's not obvious, I'm a total Mahout n00b.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Mike
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>

Re: Error Running Frequent Itemset Mining Example

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

Ya, I put it up last Thursday. Thought it might come in handy :). Of 
course, it runs LDA, k-Means, Dirichlet too, though LDA is taking 
*forever* to run build-reuters.sh with a single mapper. Gotta look into 
that next...

Jeff

On 5/18/10 7:11 PM, Mike Roberts wrote:
> Ah, nice!  That's new -- er very recently update.  Cool.  Thanks.
>
> On Tue, May 18, 2010 at 6:50 PM, Jeff Eastman<jd...@windwardsolutions.com>wrote:
>
>    
>> I'm running on a Cloudera Ubuntu based AMI that I subsequently configured
>> as in https://cwiki.apache.org/confluence/display/MAHOUT/MahoutEC2
>>
>> Jeff
>>
>>
>>
>> On 5/18/10 6:37 PM, Mike Roberts wrote:
>>
>>      
>>> Nuts, and I was just about to finish my
>>>
>>> *"A Complete Newb’s Guide to (Installing on EC2) and Actually Running
>>> Mahout
>>> from the Command Line" *wiki post.
>>>
>>> Now, I'll have to see where I went wrong.  Which distro are you running?
>>>   I
>>> started with an Alestic Ubuntu 10.4 AMI (ami-cb97c68e).
>>>
>>> On Tue, May 18, 2010 at 5:34 PM, Jeff Eastman<jdog@windwardsolutions.com
>>>        
>>>> wrote:
>>>>          
>>>
>>>
>>>        
>>>> I also brought up a single instance at
>>>> http://ec2-184-73-30-93.compute-1.amazonaws.com:50030/jobtracker.jsp and
>>>> that ran fine too. It looks to me like the problem, whatever it is, is in
>>>> your AMI or its configuration.
>>>>
>>>> Jeff
>>>>
>>>>
>>>>
>>>> On 5/18/10 5:15 PM, Jeff Eastman wrote:
>>>>
>>>>
>>>>
>>>>          
>>>>> Welll, I just brought up a 2 node cluster at
>>>>>
>>>>> http://ec2-174-129-148-227.compute-1.amazonaws.com:50030/jobtracker.jspandit ran fine.
>>>>>
>>>>>
>>>>>
>>>>> On 5/18/10 4:56 PM, Mike Roberts wrote:
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> Single instance.  Thx.
>>>>>>
>>>>>> On Tue, May 18, 2010 at 4:49 PM, Jeff Eastman<
>>>>>> jdog@windwardsolutions.com
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>   Hi Mike,
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Shouldn't happen. You running this on a single instance or on a hadoop
>>>>>>> cluster? I will see if I can duplicate.
>>>>>>>
>>>>>>> Jeff
>>>>>>>
>>>>>>>
>>>>>>> On 5/18/10 4:27 PM, Mike Roberts wrote:
>>>>>>>
>>>>>>>   Hey Guys,
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>> Just trying to get the example mentioned here working:
>>>>>>>> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html.
>>>>>>>>
>>>>>>>> I downloaded the accidents.dat file and placed it in
>>>>>>>> /home/ubuntu/mahout-in/fpm-input.
>>>>>>>> I created a directory for the output as
>>>>>>>> /home/ubuntu/mahout-in/fpm-out.
>>>>>>>> Then, I ran the following command:
>>>>>>>> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
>>>>>>>> /home/ubuntu/mahout-in/fpm-out --method mapreduce
>>>>>>>>
>>>>>>>> It runs for a bit and after the first step I get the following error:
>>>>>>>>
>>>>>>>> java.io.IOException: java.lang.ClassNotFoundException:
>>>>>>>> org.apache.mahout.common.Pair
>>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
>>>>>>>>
>>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
>>>>>>>>
>>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>>>>>>>>
>>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84)
>>>>>>>>
>>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77)
>>>>>>>>
>>>>>>>>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>>>>          at
>>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>>>>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>>>          at
>>>>>>>>
>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The step that it was running:
>>>>>>>> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
>>>>>>>> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
>>>>>>>> with
>>>>>>>> processName=JobTracker, sessionId= - already initialized
>>>>>>>> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser for
>>>>>>>> parsing the arguments. Applications should implement Tool for the
>>>>>>>> same.
>>>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>>>>> process
>>>>>>>> :
>>>>>>>> 1
>>>>>>>> 10/05/18 23:10:19 INFO mapred.JobClient: Running job: job_local_0002
>>>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>>>>> process
>>>>>>>> :
>>>>>>>> 1
>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer =
>>>>>>>> 79691776/99614720
>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer = 262144/327680
>>>>>>>> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002
>>>>>>>>
>>>>>>>> Anyone know what's going on here, or have a solution?  I verified
>>>>>>>> that
>>>>>>>> the
>>>>>>>> class file (Pair.Java) exists in
>>>>>>>> /trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn
>>>>>>>> install
>>>>>>>> in
>>>>>>>> core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on EC2.
>>>>>>>>   BTW,
>>>>>>>> if it's not obvious, I'm a total Mahout n00b.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>
>>>>>>>                
>>>>>>              
>>>>>
>>>>>            
>>>>
>>>>          
>>>
>>>        
>>
>>      
>

Re: Error Running Frequent Itemset Mining Example

Posted by Mike Roberts <su...@gmail.com>.

Ah, nice!  That's new -- er very recently update.  Cool.  Thanks.

On Tue, May 18, 2010 at 6:50 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:

> I'm running on a Cloudera Ubuntu based AMI that I subsequently configured
> as in https://cwiki.apache.org/confluence/display/MAHOUT/MahoutEC2
>
> Jeff
>
>
>
> On 5/18/10 6:37 PM, Mike Roberts wrote:
>
>> Nuts, and I was just about to finish my
>>
>> *"A Complete Newb’s Guide to (Installing on EC2) and Actually Running
>> Mahout
>> from the Command Line" *wiki post.
>>
>> Now, I'll have to see where I went wrong.  Which distro are you running?
>>  I
>> started with an Alestic Ubuntu 10.4 AMI (ami-cb97c68e).
>>
>> On Tue, May 18, 2010 at 5:34 PM, Jeff Eastman<jdog@windwardsolutions.com
>> >wrote:
>>
>>
>>
>>> I also brought up a single instance at
>>> http://ec2-184-73-30-93.compute-1.amazonaws.com:50030/jobtracker.jsp and
>>> that ran fine too. It looks to me like the problem, whatever it is, is in
>>> your AMI or its configuration.
>>>
>>> Jeff
>>>
>>>
>>>
>>> On 5/18/10 5:15 PM, Jeff Eastman wrote:
>>>
>>>
>>>
>>>> Welll, I just brought up a 2 node cluster at
>>>>
>>>> http://ec2-174-129-148-227.compute-1.amazonaws.com:50030/jobtracker.jspandit ran fine.
>>>>
>>>>
>>>>
>>>> On 5/18/10 4:56 PM, Mike Roberts wrote:
>>>>
>>>>
>>>>
>>>>> Single instance.  Thx.
>>>>>
>>>>> On Tue, May 18, 2010 at 4:49 PM, Jeff Eastman<
>>>>> jdog@windwardsolutions.com
>>>>>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>  Hi Mike,
>>>>>
>>>>>
>>>>>> Shouldn't happen. You running this on a single instance or on a hadoop
>>>>>> cluster? I will see if I can duplicate.
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>>
>>>>>> On 5/18/10 4:27 PM, Mike Roberts wrote:
>>>>>>
>>>>>>  Hey Guys,
>>>>>>
>>>>>>
>>>>>>> Just trying to get the example mentioned here working:
>>>>>>> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html.
>>>>>>>
>>>>>>> I downloaded the accidents.dat file and placed it in
>>>>>>> /home/ubuntu/mahout-in/fpm-input.
>>>>>>> I created a directory for the output as
>>>>>>> /home/ubuntu/mahout-in/fpm-out.
>>>>>>> Then, I ran the following command:
>>>>>>> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
>>>>>>> /home/ubuntu/mahout-in/fpm-out --method mapreduce
>>>>>>>
>>>>>>> It runs for a bit and after the first step I get the following error:
>>>>>>>
>>>>>>> java.io.IOException: java.lang.ClassNotFoundException:
>>>>>>> org.apache.mahout.common.Pair
>>>>>>>         at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
>>>>>>>
>>>>>>>         at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
>>>>>>>
>>>>>>>         at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>>>>>>>
>>>>>>>         at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84)
>>>>>>>
>>>>>>>         at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77)
>>>>>>>
>>>>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>>         at
>>>>>>>
>>>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The step that it was running:
>>>>>>> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
>>>>>>> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
>>>>>>> with
>>>>>>> processName=JobTracker, sessionId= - already initialized
>>>>>>> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser for
>>>>>>> parsing the arguments. Applications should implement Tool for the
>>>>>>> same.
>>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>>>> process
>>>>>>> :
>>>>>>> 1
>>>>>>> 10/05/18 23:10:19 INFO mapred.JobClient: Running job: job_local_0002
>>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>>>> process
>>>>>>> :
>>>>>>> 1
>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer =
>>>>>>> 79691776/99614720
>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer = 262144/327680
>>>>>>> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002
>>>>>>>
>>>>>>> Anyone know what's going on here, or have a solution?  I verified
>>>>>>> that
>>>>>>> the
>>>>>>> class file (Pair.Java) exists in
>>>>>>> /trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn
>>>>>>> install
>>>>>>> in
>>>>>>> core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on EC2.
>>>>>>>  BTW,
>>>>>>> if it's not obvious, I'm a total Mahout n00b.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: Error Running Frequent Itemset Mining Example

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

I'm running on a Cloudera Ubuntu based AMI that I subsequently 
configured as in 
https://cwiki.apache.org/confluence/display/MAHOUT/MahoutEC2

Jeff


On 5/18/10 6:37 PM, Mike Roberts wrote:
> Nuts, and I was just about to finish my
>
> *"A Complete Newb’s Guide to (Installing on EC2) and Actually Running Mahout
> from the Command Line" *wiki post.
>
> Now, I'll have to see where I went wrong.  Which distro are you running?  I
> started with an Alestic Ubuntu 10.4 AMI (ami-cb97c68e).
>
> On Tue, May 18, 2010 at 5:34 PM, Jeff Eastman<jd...@windwardsolutions.com>wrote:
>
>    
>> I also brought up a single instance at
>> http://ec2-184-73-30-93.compute-1.amazonaws.com:50030/jobtracker.jsp and
>> that ran fine too. It looks to me like the problem, whatever it is, is in
>> your AMI or its configuration.
>>
>> Jeff
>>
>>
>>
>> On 5/18/10 5:15 PM, Jeff Eastman wrote:
>>
>>      
>>> Welll, I just brought up a 2 node cluster at
>>> http://ec2-174-129-148-227.compute-1.amazonaws.com:50030/jobtracker.jspand it ran fine.
>>>
>>>
>>> On 5/18/10 4:56 PM, Mike Roberts wrote:
>>>
>>>        
>>>> Single instance.  Thx.
>>>>
>>>> On Tue, May 18, 2010 at 4:49 PM, Jeff Eastman<jdog@windwardsolutions.com
>>>>          
>>>>> wrote:
>>>>>            
>>>>   Hi Mike,
>>>>          
>>>>> Shouldn't happen. You running this on a single instance or on a hadoop
>>>>> cluster? I will see if I can duplicate.
>>>>>
>>>>> Jeff
>>>>>
>>>>>
>>>>> On 5/18/10 4:27 PM, Mike Roberts wrote:
>>>>>
>>>>>   Hey Guys,
>>>>>            
>>>>>> Just trying to get the example mentioned here working:
>>>>>> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html.
>>>>>>
>>>>>> I downloaded the accidents.dat file and placed it in
>>>>>> /home/ubuntu/mahout-in/fpm-input.
>>>>>> I created a directory for the output as /home/ubuntu/mahout-in/fpm-out.
>>>>>> Then, I ran the following command:
>>>>>> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
>>>>>> /home/ubuntu/mahout-in/fpm-out --method mapreduce
>>>>>>
>>>>>> It runs for a bit and after the first step I get the following error:
>>>>>>
>>>>>> java.io.IOException: java.lang.ClassNotFoundException:
>>>>>> org.apache.mahout.common.Pair
>>>>>>          at
>>>>>>
>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84)
>>>>>>
>>>>>>          at
>>>>>>
>>>>>> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77)
>>>>>>
>>>>>>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>>          at
>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>          at
>>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>>
>>>>>>
>>>>>>
>>>>>> The step that it was running:
>>>>>> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
>>>>>> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
>>>>>> with
>>>>>> processName=JobTracker, sessionId= - already initialized
>>>>>> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser for
>>>>>> parsing the arguments. Applications should implement Tool for the same.
>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>>> process
>>>>>> :
>>>>>> 1
>>>>>> 10/05/18 23:10:19 INFO mapred.JobClient: Running job: job_local_0002
>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>>> process
>>>>>> :
>>>>>> 1
>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer = 262144/327680
>>>>>> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002
>>>>>>
>>>>>> Anyone know what's going on here, or have a solution?  I verified that
>>>>>> the
>>>>>> class file (Pair.Java) exists in
>>>>>> /trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn
>>>>>> install
>>>>>> in
>>>>>> core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on EC2.
>>>>>>   BTW,
>>>>>> if it's not obvious, I'm a total Mahout n00b.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>            
>>>
>>>        
>>      
>

Re: Error Running Frequent Itemset Mining Example

Posted by Mike Roberts <su...@gmail.com>.

Nuts, and I was just about to finish my

*"A Complete Newb’s Guide to (Installing on EC2) and Actually Running Mahout
from the Command Line" *wiki post.

Now, I'll have to see where I went wrong.  Which distro are you running?  I
started with an Alestic Ubuntu 10.4 AMI (ami-cb97c68e).

On Tue, May 18, 2010 at 5:34 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:

> I also brought up a single instance at
> http://ec2-184-73-30-93.compute-1.amazonaws.com:50030/jobtracker.jsp and
> that ran fine too. It looks to me like the problem, whatever it is, is in
> your AMI or its configuration.
>
> Jeff
>
>
>
> On 5/18/10 5:15 PM, Jeff Eastman wrote:
>
>> Welll, I just brought up a 2 node cluster at
>> http://ec2-174-129-148-227.compute-1.amazonaws.com:50030/jobtracker.jspand it ran fine.
>>
>>
>> On 5/18/10 4:56 PM, Mike Roberts wrote:
>>
>>> Single instance.  Thx.
>>>
>>> On Tue, May 18, 2010 at 4:49 PM, Jeff Eastman<jdog@windwardsolutions.com
>>> >wrote:
>>>
>>>  Hi Mike,
>>>>
>>>> Shouldn't happen. You running this on a single instance or on a hadoop
>>>> cluster? I will see if I can duplicate.
>>>>
>>>> Jeff
>>>>
>>>>
>>>> On 5/18/10 4:27 PM, Mike Roberts wrote:
>>>>
>>>>  Hey Guys,
>>>>>
>>>>> Just trying to get the example mentioned here working:
>>>>> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html.
>>>>>
>>>>> I downloaded the accidents.dat file and placed it in
>>>>> /home/ubuntu/mahout-in/fpm-input.
>>>>> I created a directory for the output as /home/ubuntu/mahout-in/fpm-out.
>>>>> Then, I ran the following command:
>>>>> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
>>>>> /home/ubuntu/mahout-in/fpm-out --method mapreduce
>>>>>
>>>>> It runs for a bit and after the first step I get the following error:
>>>>>
>>>>> java.io.IOException: java.lang.ClassNotFoundException:
>>>>> org.apache.mahout.common.Pair
>>>>>         at
>>>>>
>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
>>>>>
>>>>>         at
>>>>>
>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
>>>>>
>>>>>         at
>>>>>
>>>>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>>>>>
>>>>>         at
>>>>>
>>>>> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84)
>>>>>
>>>>>         at
>>>>>
>>>>> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77)
>>>>>
>>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>         at
>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>         at
>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>
>>>>>
>>>>>
>>>>> The step that it was running:
>>>>> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
>>>>> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
>>>>> with
>>>>> processName=JobTracker, sessionId= - already initialized
>>>>> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser for
>>>>> parsing the arguments. Applications should implement Tool for the same.
>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>> process
>>>>> :
>>>>> 1
>>>>> 10/05/18 23:10:19 INFO mapred.JobClient: Running job: job_local_0002
>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>> process
>>>>> :
>>>>> 1
>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer = 262144/327680
>>>>> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002
>>>>>
>>>>> Anyone know what's going on here, or have a solution?  I verified that
>>>>> the
>>>>> class file (Pair.Java) exists in
>>>>> /trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn
>>>>> install
>>>>> in
>>>>> core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on EC2.
>>>>>  BTW,
>>>>> if it's not obvious, I'm a total Mahout n00b.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>

Re: Error Running Frequent Itemset Mining Example

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

I also brought up a single instance at 
http://ec2-184-73-30-93.compute-1.amazonaws.com:50030/jobtracker.jsp and 
that ran fine too. It looks to me like the problem, whatever it is, is 
in your AMI or its configuration.

Jeff


On 5/18/10 5:15 PM, Jeff Eastman wrote:
> Welll, I just brought up a 2 node cluster at 
> http://ec2-174-129-148-227.compute-1.amazonaws.com:50030/jobtracker.jsp and 
> it ran fine.
>
>
> On 5/18/10 4:56 PM, Mike Roberts wrote:
>> Single instance.  Thx.
>>
>> On Tue, May 18, 2010 at 4:49 PM, Jeff 
>> Eastman<jd...@windwardsolutions.com>wrote:
>>
>>> Hi Mike,
>>>
>>> Shouldn't happen. You running this on a single instance or on a hadoop
>>> cluster? I will see if I can duplicate.
>>>
>>> Jeff
>>>
>>>
>>> On 5/18/10 4:27 PM, Mike Roberts wrote:
>>>
>>>> Hey Guys,
>>>>
>>>> Just trying to get the example mentioned here working:
>>>> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html.
>>>>
>>>> I downloaded the accidents.dat file and placed it in
>>>> /home/ubuntu/mahout-in/fpm-input.
>>>> I created a directory for the output as 
>>>> /home/ubuntu/mahout-in/fpm-out.
>>>> Then, I ran the following command:
>>>> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
>>>> /home/ubuntu/mahout-in/fpm-out --method mapreduce
>>>>
>>>> It runs for a bit and after the first step I get the following error:
>>>>
>>>> java.io.IOException: java.lang.ClassNotFoundException:
>>>> org.apache.mahout.common.Pair
>>>>          at
>>>>
>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55) 
>>>>
>>>>          at
>>>>
>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36) 
>>>>
>>>>          at
>>>>
>>>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75) 
>>>>
>>>>          at
>>>>
>>>> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84) 
>>>>
>>>>          at
>>>>
>>>> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77) 
>>>>
>>>>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>          at 
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>          at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 
>>>>
>>>>
>>>>
>>>> The step that it was running:
>>>> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
>>>> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM 
>>>> Metrics with
>>>> processName=JobTracker, sessionId= - already initialized
>>>> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the 
>>>> same.
>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to 
>>>> process
>>>> :
>>>> 1
>>>> 10/05/18 23:10:19 INFO mapred.JobClient: Running job: job_local_0002
>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to 
>>>> process
>>>> :
>>>> 1
>>>> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
>>>> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer = 79691776/99614720
>>>> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer = 262144/327680
>>>> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002
>>>>
>>>> Anyone know what's going on here, or have a solution?  I verified 
>>>> that the
>>>> class file (Pair.Java) exists in
>>>> /trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn 
>>>> install
>>>> in
>>>> core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on EC2.
>>>>   BTW,
>>>> if it's not obvious, I'm a total Mahout n00b.
>>>>
>>>> Thanks,
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>
>
>

Re: Error Running Frequent Itemset Mining Example

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

Welll, I just brought up a 2 node cluster at 
http://ec2-174-129-148-227.compute-1.amazonaws.com:50030/jobtracker.jsp 
and it ran fine.


On 5/18/10 4:56 PM, Mike Roberts wrote:
> Single instance.  Thx.
>
> On Tue, May 18, 2010 at 4:49 PM, Jeff Eastman<jd...@windwardsolutions.com>wrote:
>
>    
>> Hi Mike,
>>
>> Shouldn't happen. You running this on a single instance or on a hadoop
>> cluster? I will see if I can duplicate.
>>
>> Jeff
>>
>>
>> On 5/18/10 4:27 PM, Mike Roberts wrote:
>>
>>      
>>> Hey Guys,
>>>
>>> Just trying to get the example mentioned here working:
>>> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html.
>>>
>>> I downloaded the accidents.dat file and placed it in
>>> /home/ubuntu/mahout-in/fpm-input.
>>> I created a directory for the output as /home/ubuntu/mahout-in/fpm-out.
>>> Then, I ran the following command:
>>> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
>>> /home/ubuntu/mahout-in/fpm-out --method mapreduce
>>>
>>> It runs for a bit and after the first step I get the following error:
>>>
>>> java.io.IOException: java.lang.ClassNotFoundException:
>>> org.apache.mahout.common.Pair
>>>          at
>>>
>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
>>>          at
>>>
>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
>>>          at
>>>
>>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>>>          at
>>>
>>> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84)
>>>          at
>>>
>>> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77)
>>>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>          at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>
>>>
>>> The step that it was running:
>>> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
>>> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
>>> processName=JobTracker, sessionId= - already initialized
>>> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to process
>>> :
>>> 1
>>> 10/05/18 23:10:19 INFO mapred.JobClient: Running job: job_local_0002
>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to process
>>> :
>>> 1
>>> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
>>> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer = 79691776/99614720
>>> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer = 262144/327680
>>> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002
>>>
>>> Anyone know what's going on here, or have a solution?  I verified that the
>>> class file (Pair.Java) exists in
>>> /trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn install
>>> in
>>> core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on EC2.
>>>   BTW,
>>> if it's not obvious, I'm a total Mahout n00b.
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>>
>>>
>>>        
>>
>>      
>

Re: Error Running Frequent Itemset Mining Example

Posted by Mike Roberts <su...@gmail.com>.

Single instance.  Thx.

On Tue, May 18, 2010 at 4:49 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:

> Hi Mike,
>
> Shouldn't happen. You running this on a single instance or on a hadoop
> cluster? I will see if I can duplicate.
>
> Jeff
>
>
> On 5/18/10 4:27 PM, Mike Roberts wrote:
>
>> Hey Guys,
>>
>> Just trying to get the example mentioned here working:
>> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html.
>>
>> I downloaded the accidents.dat file and placed it in
>> /home/ubuntu/mahout-in/fpm-input.
>> I created a directory for the output as /home/ubuntu/mahout-in/fpm-out.
>> Then, I ran the following command:
>> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
>> /home/ubuntu/mahout-in/fpm-out --method mapreduce
>>
>> It runs for a bit and after the first step I get the following error:
>>
>> java.io.IOException: java.lang.ClassNotFoundException:
>> org.apache.mahout.common.Pair
>>         at
>>
>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
>>         at
>>
>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
>>         at
>>
>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>>         at
>>
>> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84)
>>         at
>>
>> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>
>>
>> The step that it was running:
>> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
>> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
>> processName=JobTracker, sessionId= - already initialized
>> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to process
>> :
>> 1
>> 10/05/18 23:10:19 INFO mapred.JobClient: Running job: job_local_0002
>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to process
>> :
>> 1
>> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
>> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer = 262144/327680
>> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002
>>
>> Anyone know what's going on here, or have a solution?  I verified that the
>> class file (Pair.Java) exists in
>> /trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn install
>> in
>> core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on EC2.
>>  BTW,
>> if it's not obvious, I'm a total Mahout n00b.
>>
>> Thanks,
>>
>> Mike
>>
>>
>>
>
>

Re: Error Running Frequent Itemset Mining Example

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

Hi Mike,

Shouldn't happen. You running this on a single instance or on a hadoop 
cluster? I will see if I can duplicate.

Jeff

On 5/18/10 4:27 PM, Mike Roberts wrote:
> Hey Guys,
>
> Just trying to get the example mentioned here working:
> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html.
>
> I downloaded the accidents.dat file and placed it in
> /home/ubuntu/mahout-in/fpm-input.
> I created a directory for the output as /home/ubuntu/mahout-in/fpm-out.
> Then, I ran the following command:
> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
> /home/ubuntu/mahout-in/fpm-out --method mapreduce
>
> It runs for a bit and after the first step I get the following error:
>
> java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.mahout.common.Pair
>          at
> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
>          at
> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
>          at
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>          at
> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84)
>          at
> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77)
>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>          at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
>
> The step that it was running:
> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
> processName=JobTracker, sessionId= - already initialized
> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to process :
> 1
> 10/05/18 23:10:19 INFO mapred.JobClient: Running job: job_local_0002
> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to process :
> 1
> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer = 79691776/99614720
> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer = 262144/327680
> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002
>
> Anyone know what's going on here, or have a solution?  I verified that the
> class file (Pair.Java) exists in
> /trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn install in
> core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on EC2.  BTW,
> if it's not obvious, I'm a total Mahout n00b.
>
> Thanks,
>
> Mike
>
>