You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Bejoy <be...@hotmail.com> on 2011/01/31 15:05:17 UTC

Quality of Data Set for reccommendations

Hi Experts
            I'm having a query with the input data set used in getting
recommendations (Collaborative Filtering). This is the scenario, I need to
recommend some item to the users based on the warehoused data from shopping
portal.The data set is Boolean. When I get into the data set to be used for
getting recommendations i have noticed the situation where the same user
making purchases of the same item multiple times. Hence there would be same
user-item pairs occurring multiple times in the data set

User1,Item1
User2,Item2
user3,Item3
User1,Item1
User1,Item2
User1,Item1
User2,Item3
User1,Item1

   Here in the sample data set 'User1,Item1' pair occurs multiple times. How
would this repetition affect the recommendations in any way? Do I have to
employ a file preprocessor to eliminate these repetition before providing
the data set to mahout for computing recommendations?

Please advise. 
Thank you

Regards
Bejoy.K.S
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Quality-of-Data-Set-for-reccommendations-tp2389492p2389492.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Quality of Data Set for reccommendations

Posted by Steven Bourke <sb...@gmail.com>.
You shouldn't need to run any preprocessing on the data for mahout to
process the values ( using GenericBooleanDataModel) However, for the sake of
your recommendations it could be worthwhile to use the additional selection
of the item in evaluating recommendation scores.


On Mon, Jan 31, 2011 at 2:12 PM, Bejoy <be...@hotmail.com> wrote:

>
> I need to make both User Based and Item Based recommendations on such a
> data
> set. So it would be great if you could brief me the impact of such data
> sets
> on both type of recommendations.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Quality-of-Data-Set-for-reccommendations-tp2389492p2389523.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Re: Quality of Data Set for reccommendations

Posted by Bejoy <be...@hotmail.com>.
I need to make both User Based and Item Based recommendations on such a data
set. So it would be great if you could brief me the impact of such data sets
on both type of recommendations. 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Quality-of-Data-Set-for-reccommendations-tp2389492p2389523.html
Sent from the Mahout User List mailing list archive at Nabble.com.