You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Sebastian Schelter (JIRA)" <ji...@apache.org> on 2013/07/21 20:26:48 UTC

[jira] [Updated] (MAHOUT-1289) Move downsampling code into RowSimilarityJob

     [ https://issues.apache.org/jira/browse/MAHOUT-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-1289:
---------------------------------------

    Description: 
When computing similarities with RowSimilarityJob, downsampling highly frequent things is crucial for performance. At the moment, this is done by the data preparation code for collaborative filtering.

We should move the downsampling directly into RowSimilarityJob as we've seen a lot of cases where users want to directly use it.

Furthermore, it should be possible to fix the random seed for the sampling to be able to conduct repeatable experiments.

  was:
When computing similarities with RowSimilarityJob, downsampling highly frequent things is crucial for performance. At the moment, this is done by the data preparation code for collaborative filtering.

We should move the downsampling directly into RowSimilarityJob as we've seen a lot of cases where users want to directly use it.

    
> Move downsampling code into RowSimilarityJob
> --------------------------------------------
>
>                 Key: MAHOUT-1289
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1289
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.9
>
>
> When computing similarities with RowSimilarityJob, downsampling highly frequent things is crucial for performance. At the moment, this is done by the data preparation code for collaborative filtering.
> We should move the downsampling directly into RowSimilarityJob as we've seen a lot of cases where users want to directly use it.
> Furthermore, it should be possible to fix the random seed for the sampling to be able to conduct repeatable experiments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira