You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by lastarsenal <la...@163.com> on 2015/04/11 09:34:41 UTC

Run ItemSimilarityJob Problem

Hi,

   I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
  
hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0


There are 1 errors:
15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
Unexpected 0 while processing Job-Specific Options:                             
Usage:                                                                          
 [--input <input> --output <output> --similarityClassname <similarityClassname> 
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
<threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
<startPhase> --endPhase <endPhase>]    


What's the resaon for this situation? Thank you!


Best Regards,
lastarsenal

Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Mahout should work with any Hadoop 1 or 2 version, note that this is for Mahout 0.10.0

On Apr 26, 2015, at 9:22 PM, lastarsenal <la...@163.com> wrote:

Thank your help. It's maybe for our hadoop system and classpath jar packages(may be the appache-cli version problem) were NOT compatible with the mahout.


So, I re-rewrite the jobs In ItemSimilarityJob in my own project, then it works!

在 2015-04-16 21:21:06,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
> As I said below “mahout itemsimilarity …”
> 
> “mahout” will show a list of commands
> “mahout itemsimilarity” will show the command help
> 
> You are using HDFS and I suspect /home/hadoop/itembased/user_item is not a valid HDFS path? If so put the data in HDFS and use that path. Usually no need to specify the tmp dir.
> 
> On Apr 14, 2015, at 9:05 PM, lastarsenal <la...@163.com> wrote:
> 
> Hi, Pat,
> 
> 
>  I have tried to give a minimum arguments form ItemSimilarityJob as below:
> 
> 
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE
> 
> 
> the argument parser error dismissed but another eorror came out:
> Exception in thread "main" java.io.IOException: resolve path must start with /, temp/prepareRatingMatrix/numUsers.bin
>     at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
>     at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
>     at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
>     at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
>     at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
> 
> 
> Then I tried to add --tempDir args:
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
> The argument parser error was back:
> ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing Job-Specific Options:
> Unexpected --tempDir=/tmp while processing Job-Specific Options:                
> Usage:                                                                          
> [--input <input> --output <output> --similarityClassname <similarityClassname> 
> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
> <startPhase> --endPhase <endPhase>]                                      
> 
> 
> So...   Oh, you give advice to use command line: mahout xxx, however, there is no mahout command, how can I solve it? 
> 
> 
> Thanks a lot! 
> 
> 在 2015-04-15 03:13:23,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
> 
>> Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.
>> 
>> On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> 
>> use 
>> 
>> “mahout itemsimilarity …”
>> 
>> But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 
>> 
>> BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.
>> 
>> On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi, Pat,
>> I think it would better to follow the existing system instead of making a large scale data transfer. 
>> 
>> 
>> So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.
>> 
>> 
>> 
>> 
>> 
>> 在 2015-04-13 00:33:48,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
>>> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>>> 
>>> You’ll need to install Spark alongside Mahout then invoke with:
>>> 
>>> mahout spark-itemsimilarity -i input -o output ….
>>> 
>>> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>>> 
>>> 
>>> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>>> 
>>> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>>> 
>>> 
>>> There are 1 errors:
>>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>>> Unexpected 0 while processing Job-Specific Options:                             
>>> Usage:                                                                          
>>> [--input <input> --output <output> --similarityClassname <similarityClassname> 
>>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
>>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
>>> <startPhase> --endPhase <endPhase>]    
>>> 
>>> 
>>> What's the resaon for this situation? Thank you!
>>> 
>>> 
>>> Best Regards,
>>> lastarsenal
>>> 
>> 
>> 
> 



Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Mahout should work with any Hadoop 1 or 2 version. 

On Apr 26, 2015, at 9:22 PM, lastarsenal <la...@163.com> wrote:

Thank your help. It's maybe for our hadoop system and classpath jar packages(may be the appache-cli version problem) were NOT compatible with the mahout.


So, I re-rewrite the jobs In ItemSimilarityJob in my own project, then it works!

在 2015-04-16 21:21:06,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
> As I said below “mahout itemsimilarity …”
> 
> “mahout” will show a list of commands
> “mahout itemsimilarity” will show the command help
> 
> You are using HDFS and I suspect /home/hadoop/itembased/user_item is not a valid HDFS path? If so put the data in HDFS and use that path. Usually no need to specify the tmp dir.
> 
> On Apr 14, 2015, at 9:05 PM, lastarsenal <la...@163.com> wrote:
> 
> Hi, Pat,
> 
> 
>   I have tried to give a minimum arguments form ItemSimilarityJob as below:
> 
> 
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE
> 
> 
> the argument parser error dismissed but another eorror came out:
> Exception in thread "main" java.io.IOException: resolve path must start with /, temp/prepareRatingMatrix/numUsers.bin
>      at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
>      at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
>      at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
>      at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
>      at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>      at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:601)
>      at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
> 
> 
> Then I tried to add --tempDir args:
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
> The argument parser error was back:
> ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing Job-Specific Options:
> Unexpected --tempDir=/tmp while processing Job-Specific Options:                
> Usage:                                                                          
> [--input <input> --output <output> --similarityClassname <similarityClassname> 
> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
> <startPhase> --endPhase <endPhase>]                                      
> 
> 
> So...   Oh, you give advice to use command line: mahout xxx, however, there is no mahout command, how can I solve it? 
> 
> 
> Thanks a lot! 
> 
> 在 2015-04-15 03:13:23,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
> 
>> Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.
>> 
>> On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> 
>> use 
>> 
>> “mahout itemsimilarity …”
>> 
>> But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 
>> 
>> BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.
>> 
>> On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi, Pat,
>> I think it would better to follow the existing system instead of making a large scale data transfer. 
>> 
>> 
>> So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.
>> 
>> 
>> 
>> 
>> 
>> 在 2015-04-13 00:33:48,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
>>> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>>> 
>>> You’ll need to install Spark alongside Mahout then invoke with:
>>> 
>>> mahout spark-itemsimilarity -i input -o output ….
>>> 
>>> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>>> 
>>> 
>>> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>>> 
>>> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>>> 
>>> 
>>> There are 1 errors:
>>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>>> Unexpected 0 while processing Job-Specific Options:                             
>>> Usage:                                                                          
>>> [--input <input> --output <output> --similarityClassname <similarityClassname> 
>>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
>>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
>>> <startPhase> --endPhase <endPhase>]    
>>> 
>>> 
>>> What's the resaon for this situation? Thank you!
>>> 
>>> 
>>> Best Regards,
>>> lastarsenal
>>> 
>> 
>> 
> 


Re:Re: Run ItemSimilarityJob Problem

Posted by lastarsenal <la...@163.com>.
Thank your help. It's maybe for our hadoop system and classpath jar packages(may be the appache-cli version problem) were NOT compatible with the mahout.


So, I re-rewrite the jobs In ItemSimilarityJob in my own project, then it works!

在 2015-04-16 21:21:06,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
>As I said below “mahout itemsimilarity …”
>
>“mahout” will show a list of commands
>“mahout itemsimilarity” will show the command help
>
>You are using HDFS and I suspect /home/hadoop/itembased/user_item is not a valid HDFS path? If so put the data in HDFS and use that path. Usually no need to specify the tmp dir.
>
>On Apr 14, 2015, at 9:05 PM, lastarsenal <la...@163.com> wrote:
>
>Hi, Pat,
>
>
>    I have tried to give a minimum arguments form ItemSimilarityJob as below:
>
>
>hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE
>
>
>the argument parser error dismissed but another eorror came out:
>Exception in thread "main" java.io.IOException: resolve path must start with /, temp/prepareRatingMatrix/numUsers.bin
>       at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
>       at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
>       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
>       at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
>       at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>       at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:601)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
>
>
>Then I tried to add --tempDir args:
>hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
>The argument parser error was back:
>ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing Job-Specific Options:
>Unexpected --tempDir=/tmp while processing Job-Specific Options:                
>Usage:                                                                          
>[--input <input> --output <output> --similarityClassname <similarityClassname> 
>--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
><threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
><startPhase> --endPhase <endPhase>]                                      
>
>
>So...   Oh, you give advice to use command line: mahout xxx, however, there is no mahout command, how can I solve it? 
>
>
>Thanks a lot! 
>
>在 2015-04-15 03:13:23,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
>
>> Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.
>> 
>> On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> 
>> use 
>> 
>> “mahout itemsimilarity …”
>> 
>> But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 
>> 
>> BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.
>> 
>> On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi, Pat,
>> I think it would better to follow the existing system instead of making a large scale data transfer. 
>> 
>> 
>> So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.
>> 
>> 
>> 
>> 
>> 
>> 在 2015-04-13 00:33:48,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
>>> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>>> 
>>> You’ll need to install Spark alongside Mahout then invoke with:
>>> 
>>> mahout spark-itemsimilarity -i input -o output ….
>>> 
>>> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>>> 
>>> 
>>> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>>> 
>>> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>>> 
>>> 
>>> There are 1 errors:
>>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>>> Unexpected 0 while processing Job-Specific Options:                             
>>> Usage:                                                                          
>>> [--input <input> --output <output> --similarityClassname <similarityClassname> 
>>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
>>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
>>> <startPhase> --endPhase <endPhase>]    
>>> 
>>> 
>>> What's the resaon for this situation? Thank you!
>>> 
>>> 
>>> Best Regards,
>>> lastarsenal
>>> 
>> 
>> 
>

Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.
As I said below “mahout itemsimilarity …”

“mahout” will show a list of commands
“mahout itemsimilarity” will show the command help

You are using HDFS and I suspect /home/hadoop/itembased/user_item is not a valid HDFS path? If so put the data in HDFS and use that path. Usually no need to specify the tmp dir.

On Apr 14, 2015, at 9:05 PM, lastarsenal <la...@163.com> wrote:

Hi, Pat,


    I have tried to give a minimum arguments form ItemSimilarityJob as below:


hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE


the argument parser error dismissed but another eorror came out:
Exception in thread "main" java.io.IOException: resolve path must start with /, temp/prepareRatingMatrix/numUsers.bin
       at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
       at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
       at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
       at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:601)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:166)


Then I tried to add --tempDir args:
hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
The argument parser error was back:
ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing Job-Specific Options:
Unexpected --tempDir=/tmp while processing Job-Specific Options:                
Usage:                                                                          
[--input <input> --output <output> --similarityClassname <similarityClassname> 
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
<threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
<startPhase> --endPhase <endPhase>]                                      


So...   Oh, you give advice to use command line: mahout xxx, however, there is no mahout command, how can I solve it? 


Thanks a lot! 

在 2015-04-15 03:13:23,"Pat Ferrel" <pa...@occamsmachete.com> 写道:

> Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.
> 
> On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
> use 
> 
> “mahout itemsimilarity …”
> 
> But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 
> 
> BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.
> 
> On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:
> 
> Hi, Pat,
> I think it would better to follow the existing system instead of making a large scale data transfer. 
> 
> 
> So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.
> 
> 
> 
> 
> 
> 在 2015-04-13 00:33:48,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
>> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>> 
>> You’ll need to install Spark alongside Mahout then invoke with:
>> 
>> mahout spark-itemsimilarity -i input -o output ….
>> 
>> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>> 
>> 
>> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi,
>> 
>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>> 
>> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>> 
>> 
>> There are 1 errors:
>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>> Unexpected 0 while processing Job-Specific Options:                             
>> Usage:                                                                          
>> [--input <input> --output <output> --similarityClassname <similarityClassname> 
>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
>> <startPhase> --endPhase <endPhase>]    
>> 
>> 
>> What's the resaon for this situation? Thank you!
>> 
>> 
>> Best Regards,
>> lastarsenal
>> 
> 
> 


Re:Re: Run ItemSimilarityJob Problem

Posted by lastarsenal <la...@163.com>.
Hi, Pat,


     I have tried to give a minimum arguments form ItemSimilarityJob as below:


hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE


the argument parser error dismissed but another eorror came out:
Exception in thread "main" java.io.IOException: resolve path must start with /, temp/prepareRatingMatrix/numUsers.bin
        at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
        at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
        at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
        at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:166)


Then I tried to add --tempDir args:
hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
The argument parser error was back:
ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing Job-Specific Options:
Unexpected --tempDir=/tmp while processing Job-Specific Options:                
Usage:                                                                          
 [--input <input> --output <output> --similarityClassname <similarityClassname> 
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
<threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
<startPhase> --endPhase <endPhase>]                                      


So...   Oh, you give advice to use command line: mahout xxx, however, there is no mahout command, how can I solve it? 


Thanks a lot! 

在 2015-04-15 03:13:23,"Pat Ferrel" <pa...@occamsmachete.com> 写道:

>Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.
>
>On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
>use 
>
>“mahout itemsimilarity …”
>
>But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 
>
>BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.
>
>On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:
>
>Hi, Pat,
>  I think it would better to follow the existing system instead of making a large scale data transfer. 
>
>
> So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.
>
>
>
>
>
>在 2015-04-13 00:33:48,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
>> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>> 
>> You’ll need to install Spark alongside Mahout then invoke with:
>> 
>> mahout spark-itemsimilarity -i input -o output ….
>> 
>> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>> 
>> 
>> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi,
>> 
>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>> 
>> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>> 
>> 
>> There are 1 errors:
>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>> Unexpected 0 while processing Job-Specific Options:                             
>> Usage:                                                                          
>> [--input <input> --output <output> --similarityClassname <similarityClassname> 
>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
>> <startPhase> --endPhase <endPhase>]    
>> 
>> 
>> What's the resaon for this situation? Thank you!
>> 
>> 
>> Best Regards,
>> lastarsenal
>> 
>
>

Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.

On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

use 

“mahout itemsimilarity …”

But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 

BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.

On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:

Hi, Pat,
  I think it would better to follow the existing system instead of making a large scale data transfer. 


 So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.





在 2015-04-13 00:33:48,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
> 
> You’ll need to install Spark alongside Mahout then invoke with:
> 
> mahout spark-itemsimilarity -i input -o output ….
> 
> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
> 
> 
> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
> 
> Hi,
> 
> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
> 
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
> 
> 
> There are 1 errors:
> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
> Unexpected 0 while processing Job-Specific Options:                             
> Usage:                                                                          
> [--input <input> --output <output> --similarityClassname <similarityClassname> 
> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
> <startPhase> --endPhase <endPhase>]    
> 
> 
> What's the resaon for this situation? Thank you!
> 
> 
> Best Regards,
> lastarsenal
> 



Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.
use 

“mahout itemsimilarity …”

But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 

BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.

On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:

Hi, Pat,
   I think it would better to follow the existing system instead of making a large scale data transfer. 


  So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.





在 2015-04-13 00:33:48,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
> 
> You’ll need to install Spark alongside Mahout then invoke with:
> 
> mahout spark-itemsimilarity -i input -o output ….
> 
> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
> 
> 
> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
> 
> Hi,
> 
> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
> 
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
> 
> 
> There are 1 errors:
> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
> Unexpected 0 while processing Job-Specific Options:                             
> Usage:                                                                          
> [--input <input> --output <output> --similarityClassname <similarityClassname> 
> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
> <startPhase> --endPhase <endPhase>]    
> 
> 
> What's the resaon for this situation? Thank you!
> 
> 
> Best Regards,
> lastarsenal
> 


Re:Re: Run ItemSimilarityJob Problem

Posted by lastarsenal <la...@163.com>.
Hi, Pat,
    I think it would better to follow the existing system instead of making a large scale data transfer. 


   So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.





在 2015-04-13 00:33:48,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
>You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>
>You’ll need to install Spark alongside Mahout then invoke with:
>
>mahout spark-itemsimilarity -i input -o output ….
>
>The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>
>
>On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>
>Hi,
>
>  I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>
>hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>
>
>There are 1 errors:
>15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>Unexpected 0 while processing Job-Specific Options:                             
>Usage:                                                                          
>[--input <input> --output <output> --similarityClassname <similarityClassname> 
>--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
><threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
><startPhase> --endPhase <endPhase>]    
>
>
>What's the resaon for this situation? Thank you!
>
>
>Best Regards,
>lastarsenal
>

Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.
You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.

You’ll need to install Spark alongside Mahout then invoke with:

mahout spark-itemsimilarity -i input -o output ….

The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html


On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:

Hi,

  I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:

hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0


There are 1 errors:
15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
Unexpected 0 while processing Job-Specific Options:                             
Usage:                                                                          
[--input <input> --output <output> --similarityClassname <similarityClassname> 
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
<threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
<startPhase> --endPhase <endPhase>]    


What's the resaon for this situation? Thank you!


Best Regards,
lastarsenal