You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by "jay vyas (JIRA)" <ji...@apache.org> on 2014/12/18 03:20:13 UTC

[jira] [Comment Edited] (BIGTOP-1271) BigPetStore: Embed user "types" into the generated data.

    [ https://issues.apache.org/jira/browse/BIGTOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251026#comment-14251026 ] 

jay vyas edited comment on BIGTOP-1271 at 12/18/14 2:19 AM:
------------------------------------------------------------

data set generation is now handled by the datagenerator external library...   https://bintray.com/rnowling/bigpetstore/bigpetstore-data-generator/view  this is obsoleted.


was (Author: jayunit100):
data set generation is no handled by the datagenerator external library...   https://bintray.com/rnowling/bigpetstore/bigpetstore-data-generator/view  this is obsoleted.

> BigPetStore: Embed user "types" into the generated data.
> --------------------------------------------------------
>
>                 Key: BIGTOP-1271
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1271
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: blueprints
>    Affects Versions: backlog
>            Reporter: jay vyas
>
> The data set generation in BigPetStore results in data with temporal and geographic patterns, however, there are no "personal" biases in the data.
> We need to add personal biases into the data so that the Mahout recommender is capable of teasing out statistically significant product clusters for users. 
> A simple implementation:  
> {noformat} 
> given 2 "types" of customers (i.e. dog people, cat people)
> t = hash (customer_name) % 2
> if(t==0)
>    customer buys only dog products
> if(t==1) 
>    customer buys only cat products
> {noformat}
> This approach will easily scale and consistently embed profiles into each persons purchases.  Obviously using some OO magic we can create customers who also buy cat and dog products both... but the basic approach still remains (hash code -> customer type -> product biases).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)