You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by PierLorenzo Bianchini <pi...@yahoo.com.INVALID> on 2015/04/10 14:49:16 UTC

evaluating recommender

Hi all,
I have a question on the results of an evaluation (I'm using "RMSRecommenderEvaluator").

I'm getting a result of "0.7432629235004433" with one of the recommenders I'm testing. I read in several places that "0.0" would be the perfect result, but I couldn't find which ranges are acceptable.
I've seen values ranging from 0.49 to 1.04 with different implementations (I mostly do user-based with Pearson and model based with SVD transformations) and setting different parameters. I've also seen values up to 3.0 but I was testing "bad" cases (low amount of data used, bad percentage of trainign data, etc.; I guess I could get results even worse than that but I didn't try it)
When can I consider that my recommender is "good enough", when should I consider that my evaluation is too bad? (for now I randomly assumed that 0.9 is a good value and I'm trying to stick around that value)
Perhaps someone knows where I could find a documentation for this? any help would be appreciated.
Thank you! Regards,

PL

*FYI* I have a user/movie/rating dataset. 6000 users for 3900 movies. I have a static training file with 800.000 triplets and I'm using them to evaluate different types of recommender (this is a university requirement, I'm not talking about production environments)

Re: evaluating recommender

Posted by PierLorenzo Bianchini <pi...@yahoo.com.INVALID>.

Oh right, why did I not think about that :) you're totally right
Thanks a bunch! concerning MAP and the other methods you mentioned. Nice, I had a quick look. I'll definitely dig deeper after I'm done with my submission (tomorrow night...).
Regards,

--------------------------------------------
On Fri, 4/10/15, Pat Ferrel <pa...@occamsmachete.com> wrote:

Subject: Re: evaluating recommender
To: "user@mahout.apache.org" <us...@mahout.apache.org>
Date: Friday, April 10, 2015, 11:42 PM

I think that depends on
the rating range you are using. It measures the error
between predicted and actual rating. Google RMSE for a
better explanation.

BTW
that is an old and not very good metric. It was popularized
by the Netlfix prize many years ago when they thought they
wanted to predict ratings. Actually even Netflix admits that
_ranking_ recs is far more important. If you can only show a
few recs they had better be ranked the best you can. For
this a precision metric is better. I use mean average
precision (MAP).

Be aware
also that using an offline metric to judge different
algorithms is not very reliable. Online A/B or Bayesian
Bandit tests are much better.

On Apr 10, 2015, at 5:49 AM, PierLorenzo
Bianchini <pi...@yahoo.com.INVALID>
wrote:

Hi all,
I have a question on the results of an
evaluation (I'm using
"RMSRecommenderEvaluator").

I'm getting a result of
"0.7432629235004433" with one of the recommenders
I'm testing. I read in several places that
"0.0" would be the perfect result, but I
couldn't find which ranges are acceptable.
I've seen values ranging from 0.49 to 1.04
with different implementations (I mostly do user-based with
Pearson and model based with SVD transformations) and
setting different parameters. I've also seen values up
to 3.0 but I was testing "bad" cases (low amount
of data used, bad percentage of trainign data, etc.; I guess
I could get results even worse than that but I didn't
try it)
When can I consider that my
recommender is "good enough", when should I
consider that my evaluation is too bad? (for now I randomly
assumed that 0.9 is a good value and I'm trying to stick
around that value)
Perhaps someone knows
where I could find a documentation for this? any help would
be appreciated.
Thank you! Regards,

*FYI* I have a user/movie/rating dataset. 6000
users for 3900 movies. I have a static training file with
800.000 triplets and I'm using them to evaluate
different types of recommender (this is a university
requirement, I'm not talking about production
environments)

Re: evaluating recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

I think that depends on the rating range you are using. It measures the error between predicted and actual rating. Google RMSE for a better explanation.

BTW that is an old and not very good metric. It was popularized by the Netlfix prize many years ago when they thought they wanted to predict ratings. Actually even Netflix admits that _ranking_ recs is far more important. If you can only show a few recs they had better be ranked the best you can. For this a precision metric is better. I use mean average precision (MAP).

Be aware also that using an offline metric to judge different algorithms is not very reliable. Online A/B or Bayesian Bandit tests are much better.

On Apr 10, 2015, at 5:49 AM, PierLorenzo Bianchini <pi...@yahoo.com.INVALID> wrote:

Hi all,
I have a question on the results of an evaluation (I'm using "RMSRecommenderEvaluator").

I'm getting a result of "0.7432629235004433" with one of the recommenders I'm testing. I read in several places that "0.0" would be the perfect result, but I couldn't find which ranges are acceptable.
I've seen values ranging from 0.49 to 1.04 with different implementations (I mostly do user-based with Pearson and model based with SVD transformations) and setting different parameters. I've also seen values up to 3.0 but I was testing "bad" cases (low amount of data used, bad percentage of trainign data, etc.; I guess I could get results even worse than that but I didn't try it)
When can I consider that my recommender is "good enough", when should I consider that my evaluation is too bad? (for now I randomly assumed that 0.9 is a good value and I'm trying to stick around that value)
Perhaps someone knows where I could find a documentation for this? any help would be appreciated.
Thank you! Regards,

*FYI* I have a user/movie/rating dataset. 6000 users for 3900 movies. I have a static training file with 800.000 triplets and I'm using them to evaluate different types of recommender (this is a university requirement, I'm not talking about production environments)

Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Mahout should work with any Hadoop 1 or 2 version, note that this is for Mahout 0.10.0

On Apr 26, 2015, at 9:22 PM, lastarsenal <la...@163.com> wrote:

Thank your help. It's maybe for our hadoop system and classpath jar packages(may be the appache-cli version problem) were NOT compatible with the mahout.


So, I re-rewrite the jobs In ItemSimilarityJob in my own project, then it works!

在 2015-04-16 21:21:06，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
> As I said below “mahout itemsimilarity …”
> 
> “mahout” will show a list of commands
> “mahout itemsimilarity” will show the command help
> 
> You are using HDFS and I suspect /home/hadoop/itembased/user_item is not a valid HDFS path? If so put the data in HDFS and use that path. Usually no need to specify the tmp dir.
> 
> On Apr 14, 2015, at 9:05 PM, lastarsenal <la...@163.com> wrote:
> 
> Hi, Pat,
> 
> 
>  I have tried to give a minimum arguments form ItemSimilarityJob as below:
> 
> 
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE
> 
> 
> the argument parser error dismissed but another eorror came out:
> Exception in thread "main" java.io.IOException: resolve path must start with /, temp/prepareRatingMatrix/numUsers.bin
>     at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
>     at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
>     at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
>     at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
>     at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
> 
> 
> Then I tried to add --tempDir args:
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
> The argument parser error was back:
> ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing Job-Specific Options:
> Unexpected --tempDir=/tmp while processing Job-Specific Options:                
> Usage:                                                                          
> [--input <input> --output <output> --similarityClassname <similarityClassname> 
> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
> <startPhase> --endPhase <endPhase>]                                      
> 
> 
> So...   Oh, you give advice to use command line: mahout xxx， however, there is no mahout command, how can I solve it? 
> 
> 
> Thanks a lot! 
> 
> 在 2015-04-15 03:13:23，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
> 
>> Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.
>> 
>> On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> 
>> use 
>> 
>> “mahout itemsimilarity …”
>> 
>> But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 
>> 
>> BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.
>> 
>> On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi, Pat,
>> I think it would better to follow the existing system instead of making a large scale data transfer. 
>> 
>> 
>> So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.
>> 
>> 
>> 
>> 
>> 
>> 在 2015-04-13 00:33:48，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
>>> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>>> 
>>> You’ll need to install Spark alongside Mahout then invoke with:
>>> 
>>> mahout spark-itemsimilarity -i input -o output ….
>>> 
>>> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>>> 
>>> 
>>> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>>> 
>>> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>>> 
>>> 
>>> There are 1 errors:
>>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>>> Unexpected 0 while processing Job-Specific Options:                             
>>> Usage:                                                                          
>>> [--input <input> --output <output> --similarityClassname <similarityClassname> 
>>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
>>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
>>> <startPhase> --endPhase <endPhase>]    
>>> 
>>> 
>>> What's the resaon for this situation? Thank you!
>>> 
>>> 
>>> Best Regards,
>>> lastarsenal
>>> 
>> 
>> 
>

Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Mahout should work with any Hadoop 1 or 2 version. 

On Apr 26, 2015, at 9:22 PM, lastarsenal <la...@163.com> wrote:

Thank your help. It's maybe for our hadoop system and classpath jar packages(may be the appache-cli version problem) were NOT compatible with the mahout.


So, I re-rewrite the jobs In ItemSimilarityJob in my own project, then it works!

在 2015-04-16 21:21:06，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
> As I said below “mahout itemsimilarity …”
> 
> “mahout” will show a list of commands
> “mahout itemsimilarity” will show the command help
> 
> You are using HDFS and I suspect /home/hadoop/itembased/user_item is not a valid HDFS path? If so put the data in HDFS and use that path. Usually no need to specify the tmp dir.
> 
> On Apr 14, 2015, at 9:05 PM, lastarsenal <la...@163.com> wrote:
> 
> Hi, Pat,
> 
> 
>   I have tried to give a minimum arguments form ItemSimilarityJob as below:
> 
> 
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE
> 
> 
> the argument parser error dismissed but another eorror came out:
> Exception in thread "main" java.io.IOException: resolve path must start with /, temp/prepareRatingMatrix/numUsers.bin
>      at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
>      at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
>      at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
>      at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
>      at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>      at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:601)
>      at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
> 
> 
> Then I tried to add --tempDir args:
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
> The argument parser error was back:
> ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing Job-Specific Options:
> Unexpected --tempDir=/tmp while processing Job-Specific Options:                
> Usage:                                                                          
> [--input <input> --output <output> --similarityClassname <similarityClassname> 
> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
> <startPhase> --endPhase <endPhase>]                                      
> 
> 
> So...   Oh, you give advice to use command line: mahout xxx， however, there is no mahout command, how can I solve it? 
> 
> 
> Thanks a lot! 
> 
> 在 2015-04-15 03:13:23，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
> 
>> Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.
>> 
>> On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> 
>> use 
>> 
>> “mahout itemsimilarity …”
>> 
>> But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 
>> 
>> BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.
>> 
>> On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi, Pat,
>> I think it would better to follow the existing system instead of making a large scale data transfer. 
>> 
>> 
>> So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.
>> 
>> 
>> 
>> 
>> 
>> 在 2015-04-13 00:33:48，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
>>> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>>> 
>>> You’ll need to install Spark alongside Mahout then invoke with:
>>> 
>>> mahout spark-itemsimilarity -i input -o output ….
>>> 
>>> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>>> 
>>> 
>>> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>>> 
>>> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>>> 
>>> 
>>> There are 1 errors:
>>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>>> Unexpected 0 while processing Job-Specific Options:                             
>>> Usage:                                                                          
>>> [--input <input> --output <output> --similarityClassname <similarityClassname> 
>>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
>>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
>>> <startPhase> --endPhase <endPhase>]    
>>> 
>>> 
>>> What's the resaon for this situation? Thank you!
>>> 
>>> 
>>> Best Regards,
>>> lastarsenal
>>> 
>> 
>> 
>

Re:Re: Run ItemSimilarityJob Problem

Posted by lastarsenal <la...@163.com>.

Thank your help. It's maybe for our hadoop system and classpath jar packages(may be the appache-cli version problem) were NOT compatible with the mahout.


So, I re-rewrite the jobs In ItemSimilarityJob in my own project, then it works!

在 2015-04-16 21:21:06，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
>As I said below “mahout itemsimilarity …”
>
>“mahout” will show a list of commands
>“mahout itemsimilarity” will show the command help
>
>You are using HDFS and I suspect /home/hadoop/itembased/user_item is not a valid HDFS path? If so put the data in HDFS and use that path. Usually no need to specify the tmp dir.
>
>On Apr 14, 2015, at 9:05 PM, lastarsenal <la...@163.com> wrote:
>
>Hi, Pat,
>
>
>    I have tried to give a minimum arguments form ItemSimilarityJob as below:
>
>
>hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE
>
>
>the argument parser error dismissed but another eorror came out:
>Exception in thread "main" java.io.IOException: resolve path must start with /, temp/prepareRatingMatrix/numUsers.bin
>       at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
>       at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
>       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
>       at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
>       at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>       at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:601)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
>
>
>Then I tried to add --tempDir args:
>hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
>The argument parser error was back:
>ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing Job-Specific Options:
>Unexpected --tempDir=/tmp while processing Job-Specific Options:                
>Usage:                                                                          
>[--input <input> --output <output> --similarityClassname <similarityClassname> 
>--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
><threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
><startPhase> --endPhase <endPhase>]                                      
>
>
>So...   Oh, you give advice to use command line: mahout xxx， however, there is no mahout command, how can I solve it? 
>
>
>Thanks a lot! 
>
>在 2015-04-15 03:13:23，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
>
>> Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.
>> 
>> On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> 
>> use 
>> 
>> “mahout itemsimilarity …”
>> 
>> But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 
>> 
>> BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.
>> 
>> On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi, Pat,
>> I think it would better to follow the existing system instead of making a large scale data transfer. 
>> 
>> 
>> So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.
>> 
>> 
>> 
>> 
>> 
>> 在 2015-04-13 00:33:48，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
>>> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>>> 
>>> You’ll need to install Spark alongside Mahout then invoke with:
>>> 
>>> mahout spark-itemsimilarity -i input -o output ….
>>> 
>>> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>>> 
>>> 
>>> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>>> 
>>> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>>> 
>>> 
>>> There are 1 errors:
>>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>>> Unexpected 0 while processing Job-Specific Options:                             
>>> Usage:                                                                          
>>> [--input <input> --output <output> --similarityClassname <similarityClassname> 
>>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
>>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
>>> <startPhase> --endPhase <endPhase>]    
>>> 
>>> 
>>> What's the resaon for this situation? Thank you!
>>> 
>>> 
>>> Best Regards,
>>> lastarsenal
>>> 
>> 
>> 
>

Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.

As I said below “mahout itemsimilarity …”

“mahout” will show a list of commands
“mahout itemsimilarity” will show the command help

You are using HDFS and I suspect /home/hadoop/itembased/user_item is not a valid HDFS path? If so put the data in HDFS and use that path. Usually no need to specify the tmp dir.

On Apr 14, 2015, at 9:05 PM, lastarsenal <la...@163.com> wrote:

Hi, Pat,


    I have tried to give a minimum arguments form ItemSimilarityJob as below:


hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE


the argument parser error dismissed but another eorror came out:
Exception in thread "main" java.io.IOException: resolve path must start with /, temp/prepareRatingMatrix/numUsers.bin
       at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
       at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
       at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
       at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:601)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:166)


Then I tried to add --tempDir args:
hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
The argument parser error was back:
ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing Job-Specific Options:
Unexpected --tempDir=/tmp while processing Job-Specific Options:                
Usage:                                                                          
[--input <input> --output <output> --similarityClassname <similarityClassname> 
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
<threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
<startPhase> --endPhase <endPhase>]                                      


So...   Oh, you give advice to use command line: mahout xxx， however, there is no mahout command, how can I solve it? 


Thanks a lot! 

在 2015-04-15 03:13:23，"Pat Ferrel" <pa...@occamsmachete.com> 写道：

> Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.
> 
> On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
> use 
> 
> “mahout itemsimilarity …”
> 
> But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 
> 
> BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.
> 
> On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:
> 
> Hi, Pat,
> I think it would better to follow the existing system instead of making a large scale data transfer. 
> 
> 
> So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.
> 
> 
> 
> 
> 
> 在 2015-04-13 00:33:48，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
>> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>> 
>> You’ll need to install Spark alongside Mahout then invoke with:
>> 
>> mahout spark-itemsimilarity -i input -o output ….
>> 
>> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>> 
>> 
>> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi,
>> 
>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>> 
>> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>> 
>> 
>> There are 1 errors:
>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>> Unexpected 0 while processing Job-Specific Options:                             
>> Usage:                                                                          
>> [--input <input> --output <output> --similarityClassname <similarityClassname> 
>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
>> <startPhase> --endPhase <endPhase>]    
>> 
>> 
>> What's the resaon for this situation? Thank you!
>> 
>> 
>> Best Regards,
>> lastarsenal
>> 
> 
>

Re:Re: Run ItemSimilarityJob Problem

Posted by lastarsenal <la...@163.com>.

Hi, Pat,


     I have tried to give a minimum arguments form ItemSimilarityJob as below:


hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE


the argument parser error dismissed but another eorror came out:
Exception in thread "main" java.io.IOException: resolve path must start with /, temp/prepareRatingMatrix/numUsers.bin
        at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
        at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
        at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
        at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:166)


Then I tried to add --tempDir args:
hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
The argument parser error was back:
ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing Job-Specific Options:
Unexpected --tempDir=/tmp while processing Job-Specific Options:                
Usage:                                                                          
 [--input <input> --output <output> --similarityClassname <similarityClassname> 
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
<threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
<startPhase> --endPhase <endPhase>]                                      


So...   Oh, you give advice to use command line: mahout xxx， however, there is no mahout command, how can I solve it? 


Thanks a lot! 

在 2015-04-15 03:13:23，"Pat Ferrel" <pa...@occamsmachete.com> 写道：

>Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.
>
>On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
>use 
>
>“mahout itemsimilarity …”
>
>But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 
>
>BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.
>
>On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:
>
>Hi, Pat,
>  I think it would better to follow the existing system instead of making a large scale data transfer. 
>
>
> So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.
>
>
>
>
>
>在 2015-04-13 00:33:48，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
>> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>> 
>> You’ll need to install Spark alongside Mahout then invoke with:
>> 
>> mahout spark-itemsimilarity -i input -o output ….
>> 
>> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>> 
>> 
>> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi,
>> 
>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>> 
>> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>> 
>> 
>> There are 1 errors:
>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>> Unexpected 0 while processing Job-Specific Options:                             
>> Usage:                                                                          
>> [--input <input> --output <output> --similarityClassname <similarityClassname> 
>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
>> <startPhase> --endPhase <endPhase>]    
>> 
>> 
>> What's the resaon for this situation? Thank you!
>> 
>> 
>> Best Regards,
>> lastarsenal
>> 
>
>

Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Also you don’t need to specify -mp 0 that is always allowed, you are specifying minimum if there are any and so -mp 0 is not valid, omit it.

On Apr 14, 2015, at 11:59 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

use 

“mahout itemsimilarity …”

But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 

BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.

On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:

Hi, Pat,
  I think it would better to follow the existing system instead of making a large scale data transfer. 


 So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.





在 2015-04-13 00:33:48，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
> 
> You’ll need to install Spark alongside Mahout then invoke with:
> 
> mahout spark-itemsimilarity -i input -o output ….
> 
> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
> 
> 
> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
> 
> Hi,
> 
> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
> 
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
> 
> 
> There are 1 errors:
> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
> Unexpected 0 while processing Job-Specific Options:                             
> Usage:                                                                          
> [--input <input> --output <output> --similarityClassname <similarityClassname> 
> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
> <startPhase> --endPhase <endPhase>]    
> 
> 
> What's the resaon for this situation? Thank you!
> 
> 
> Best Regards,
> lastarsenal
>

Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.

use 

“mahout itemsimilarity …”

But be aware that you have to convert all your user and item ids into non-negative ints. Basically inside Mahout-MapReduce they are assumed to be row and column numbers in a big matrix of all input. 

BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can read without the ID restrictions.

On Apr 12, 2015, at 8:04 PM, lastarsenal <la...@163.com> wrote:

Hi, Pat,
   I think it would better to follow the existing system instead of making a large scale data transfer. 


  So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.





在 2015-04-13 00:33:48，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
> You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
> 
> You’ll need to install Spark alongside Mahout then invoke with:
> 
> mahout spark-itemsimilarity -i input -o output ….
> 
> The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
> 
> 
> On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
> 
> Hi,
> 
> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
> 
> hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
> 
> 
> There are 1 errors:
> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
> Unexpected 0 while processing Job-Specific Options:                             
> Usage:                                                                          
> [--input <input> --output <output> --similarityClassname <similarityClassname> 
> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
> <startPhase> --endPhase <endPhase>]    
> 
> 
> What's the resaon for this situation? Thank you!
> 
> 
> Best Regards,
> lastarsenal
>

Re:Re: Run ItemSimilarityJob Problem

Posted by lastarsenal <la...@163.com>.

Hi, Pat,
    I think it would better to follow the existing system instead of making a large scale data transfer. 


   So, I will be very appreciated if somebody can give the advice based on hadoop, Thank you.





在 2015-04-13 00:33:48，"Pat Ferrel" <pa...@occamsmachete.com> 写道：
>You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.
>
>You’ll need to install Spark alongside Mahout then invoke with:
>
>mahout spark-itemsimilarity -i input -o output ….
>
>The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>
>
>On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:
>
>Hi,
>
>  I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
>
>hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>
>
>There are 1 errors:
>15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
>Unexpected 0 while processing Job-Specific Options:                             
>Usage:                                                                          
>[--input <input> --output <output> --similarityClassname <similarityClassname> 
>--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
>--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
><threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
><startPhase> --endPhase <endPhase>]    
>
>
>What's the resaon for this situation? Thank you!
>
>
>Best Regards,
>lastarsenal
>

Re: Run ItemSimilarityJob Problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.

You are invoking it incorrectly but I’d suggest using the newer Spark version. It’s easier to use and about 10x faster.

You’ll need to install Spark alongside Mahout then invoke with:

mahout spark-itemsimilarity -i input -o output ….

The driver is documented here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html


On Apr 11, 2015, at 12:34 AM, lastarsenal <la...@163.com> wrote:

Hi,

  I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:

hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0


There are 1 errors:
15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
Unexpected 0 while processing Job-Specific Options:                             
Usage:                                                                          
[--input <input> --output <output> --similarityClassname <similarityClassname> 
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
<threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
<startPhase> --endPhase <endPhase>]    


What's the resaon for this situation? Thank you!


Best Regards,
lastarsenal

Run ItemSimilarityJob Problem

Posted by lastarsenal <la...@163.com>.

Hi,

   I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with my own hadoop, I met a problem. The command is:
  
hadoop jar mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0


There are 1 errors:
15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing Job-Specific Options:
Unexpected 0 while processing Job-Specific Options:                             
Usage:                                                                          
 [--input <input> --output <output> --similarityClassname <similarityClassname> 
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
<threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
<startPhase> --endPhase <endPhase>]    


What's the resaon for this situation? Thank you!


Best Regards,
lastarsenal