You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jonathan Young (JIRA)" <ji...@apache.org> on 2010/06/22 16:30:55 UTC

[jira] Updated: (MAHOUT-423) Optimize getNumUsersWithPreferenceFor(long... itemIDs)

     [ https://issues.apache.org/jira/browse/MAHOUT-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Young updated MAHOUT-423:
----------------------------------

    Attachment: MAHOUT-423.patch

This patch is for trunk, and optimizes two special cases: itemIDs.length == 1 (don't create the intersection set, just return the number of the preferences) and itemIDs.length == 2 (don't create the intersection set, use the existing set and the fast intersectionSize() function on FastIDSet.

> Optimize getNumUsersWithPreferenceFor(long... itemIDs)
> ------------------------------------------------------
>
>                 Key: MAHOUT-423
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-423
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>            Reporter: Jonathan Young
>         Attachments: MAHOUT-423.patch
>
>
> I ran a simple collaborative filtering application using a GenericBooleanPrefDataModel built from (a subset of) the Netflix data, Tanimoto similarity, and the GenericItemBasedRecommender, and then called recommender.mostSimilarItems() (a lot).  
> Profiling indicated that the majority of the time was spent in GenericBooleanPrefDataModel.getNumUsersWithPreferenceFor(long... itemIDs).  The version in GenericDataModel is optimized for the cases of one and two itemIDs, but the version in GenericBooleanPrefDataModel always computes the intersection set.
> I can create a patch which optimizes the two cases of itemIDs.length == 1 and itemIDs.length == 2 (similar to the version in GenericDataModel), but perhaps the code should be refactored if these are really the most common cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.