You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Lance Norskog (JIRA)" <ji...@apache.org> on 2011/05/31 00:07:47 UTC
[jira] [Reopened] (MAHOUT-676) Random samplers in a modular library
[ https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lance Norskog reopened MAHOUT-676:
----------------------------------
As mentioned, I got interested again.
> Random samplers in a modular library
> ------------------------------------
>
> Key: MAHOUT-676
> URL: https://issues.apache.org/jira/browse/MAHOUT-676
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Reporter: Lance Norskog
> Priority: Minor
> Attachments: MAHOUT-676.patch, Sampler.patch
>
>
> This is a modular suite of samplers. It supplies the ability to throw away samples in a useful way.
> Here is a use case: for my recommendations, I want user activity to decide the amount of influence on the results. For the number of users who watch X number of movies: 1-5 is 20%, 6-15 is 50%, 15-30 is 30 %, and users who watch over 30 movies are not useful.
> * If I know the input distribution, I can supply a function to the Slice sampler to give this distribution.
> * If I don't know the distribution, I can create a Reservoir sampler for each of the three buckets. After reading the whole set, I check the sizes of the various buckets and solve for my distribution. This gives the number of users to pull from each bucket.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira