You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Anatoliy Kats (Created) (JIRA)" <ji...@apache.org> on 2011/12/12 16:27:36 UTC

[jira] [Created] (MAHOUT-925) Evaluate the reach of recommender algorithms

Evaluate the reach of recommender algorithms
--------------------------------------------

                 Key: MAHOUT-925
                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
    Affects Versions: 0.5
            Reporter: Anatoliy Kats
            Assignee: Sean Owen
            Priority: Minor


The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.

My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Anatoliy Kats (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168204#comment-13168204 ] 

Anatoliy Kats commented on MAHOUT-925:
--------------------------------------

That's a good point, we should be careful about how we analyze undersampled data.  The purpose of measuring reach is to predict what percentage of the audience in a production system will get a required number of recommendations.  Actually I think the easiest way to do this is to loop over the users, and try to generate recommendation on the model that does not exclude any preferences.

Also, in the spirit of creating conditions maximally similar to a production environment, it seems unfair to exclude users because the evaluator judges there are not enough preferences remaining (lines 116-118 in the patched code).  The recommender should decide for itself whether or not to generate anything.  Only if it refuses to generate the required number of recommendations do we exclude the user from the IR statistics.  This kind of a change would always make precision and recall equal.  They often are in practice.  What was the original motivation for including both statistics?
                
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168299#comment-13168299 ] 

Sean Owen commented on MAHOUT-925:
----------------------------------

Yes you could create a different kind of test that doesn't hold out any data to find this reach figure. I don't think it's worth a whole different test class just for this. The entire test framework is only valid insofar as you run it on enough data, with enough to train, that the result reflects how the full system works. So I think it's as valid as anything else to run on the training data only.

Regarding the "2@" prefs heuristic: it's not really a question of the recommender deciding *not* to recommend. It's that it will *always* recommend as much as possible, up to what you ask for. But if the test is based on so little data to begin with, the result is not very meaningful. If I am figuring precision@5 and the user has only 4 prefs, what can I do? I can't even call all 4 "relevant" items since it would leave no training data. Even if I did, there would be no way to achieve 100% precision as there are only 4 relevant items. I (arbitrarily) picked 2@ as the minimum -- 10 here if @=5 -- since you can select 5 of the 10 in this case as relevant, and have as many available for training.

You would not want to drop a user's result just because it recommended 3 items in a test @5. That's a perfectly valid result (given the condition in the preceding paragraph) to include. You can still decide how many of those 3 are relevant, and how many of the relevant items are in those 3.

Precision and recall are not the same in general. If the number of items deemed relevant is equal to "@", then precision will equal recall, yes. And that is usually true for data with ratings, the way this class works. It will just choose some "@" of the items, as there is no basis to call one more relevant than the other. Choosing that many is also somewhat arbitrary; it can't be 0, and can't be all items (or there would no training data from the user under test), so that looked like a nice round number.
                
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Anatoliy Kats (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168231#comment-13168231 ] 

Anatoliy Kats commented on MAHOUT-925:
--------------------------------------

Actually I see the difference between precision and recall still remains, but not in boolean case.
                
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168389#comment-13168389 ] 

Ted Dunning commented on MAHOUT-925:
------------------------------------

Reach is a nice statistic to have, but I think it can be had more simply than this.

In my experience, quality of recommendations depends very strongly on the number of items in the history.  Where the history is too small, recommendations will typically be pretty poor and above a threshold, they will be as good as they are going to be.  For music, that threshold was 5-10 items, for video it was comparable.

IF this is true, then the reach computation can be broken into two parts:

a) what is the threshold?

b) how many people reach the threshold?

The first question is answerable by the standard precision recall measurement methods except that the resulting data need to be averaged with an awareness of the history size so that the threshold can be detected.

The second question is simple arithmetic and doesn't need a framework.
                
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch, MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Anatoliy Kats (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anatoliy Kats updated MAHOUT-925:
---------------------------------

    Attachment: MAHOUT-925.patch
    
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Anatoliy Kats (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anatoliy Kats updated MAHOUT-925:
---------------------------------

    Status: Patch Available  (was: Open)
    
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch, MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Sean Owen (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-925:
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.6
           Status: Resolved  (was: Patch Available)

Committed a variation on the patch which closes the immediate issue; we could talk about it further though.
                
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: MAHOUT-925.patch, MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167573#comment-13167573 ] 

Sean Owen commented on MAHOUT-925:
----------------------------------

This is fine, though, don't you want to count like so?

    if (numRecommendedItems > 0) {
      reach++;
    }

Otherwise it seems like you're just counting all users, except the ones that the *test framework* couldn't test due to sampling size, which is something else and something you want to ignore in general. (So actually I think I'd count both reach and the number of users that the test framework succeeded for). Is that about right for you?
                
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168412#comment-13168412 ] 

Sean Owen commented on MAHOUT-925:
----------------------------------

@Anatoliy how would the recommender decide a relevance threshold? That is a priori knowledge not something the recommender can know. This seems orthogonal to the other issues here. If you're OK with the computation as described I'll commit and of course can iterate on it later.

@Ted that's probably true, though all the same I'm happy to add 3 lines of code and 1 more field to the result to compute 'reach' as Anatoliy describes here. It's very little additional code or complexity.
                
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch, MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Anatoliy Kats (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anatoliy Kats updated MAHOUT-925:
---------------------------------

    Attachment: MAHOUT-925.patch
    
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch, MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Anatoliy Kats (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168333#comment-13168333 ] 

Anatoliy Kats commented on MAHOUT-925:
--------------------------------------

We agree that there needs to be enough training data for a recommender to output something, but you believe the cutoff should happen in the evaluator, whereas I think the recommender should figure this out by itself, via some sort of a threshold on the expected rating.  For now, the distinction is mostly theoretical to, so let's use what we already have.  I'll change the reach calculation as you suggest.
                
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-925) Evaluate the reach of recommender algorithms

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168927#comment-13168927 ] 

Hudson commented on MAHOUT-925:
-------------------------------

Integrated in Mahout-Quality #1252 (See [https://builds.apache.org/job/Mahout-Quality/1252/])
    MAHOUT-925 Add basic idea of 'reach'

srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1213930
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/eval/IRStatistics.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/eval/GenericRecommenderIRStatsEvaluator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/eval/IRStatisticsImpl.java

                
> Evaluate the reach of recommender algorithms
> --------------------------------------------
>
>                 Key: MAHOUT-925
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-925
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: MAHOUT-925.patch, MAHOUT-925.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The evaluation of a CF algorithm should include reach, the proportion of users for whom a recommendation could be made.  An algorithm usually has a cutoff value on the confidence of the recommender, and if it is not high enough, no recommendation is made.  The number of requested recommendations, or this parameter could be varied as part of the evaluation.  The proposed patch adds this.
> My build with this patch breaks testMapper(org.apache.mahout.classifier.df.mapreduce.partial.Step1MapperTest): org.apache.mahout.classifier.df.node.Leaf.<init>(I)V .  The test seems unrelated to the patch, so I am assuming this is broken in the trunk head as well.  Unfortunately I am under a deadline, and I do not have time to write tests for the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira