You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Yash Patel <ya...@gmail.com> on 2014/11/26 20:16:25 UTC

User based recommender

Dear Mahout Team,

I am a student new to machine learning and i am trying to build a user
based recommender using mahout.

My dataset is a csv file as an input but it has many fields as text and i
understand mahout needs numeric values.

Can you give me a headstart as to where i should start and what kind of
tools i need to parse the text colummns,

Also an idea on which classifiers or clustering methods i should use would
be highly appreciated.


Best Regards;
Yash Patel

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

Calculating similarity using multiple column values is what i thought,I
looked throught the example but there is just some mention of use of
content filtering implemented but not explicitly.

Can you guide me to a working example or do i need to use
algorithms for classifiers or clustering?
Also if i have to can i implement the results using the recommenders
provided in mahout.

Best Regards,
Yash Patel

On Wed, Dec 3, 2014 at 3:43 PM, parnab kumar <pa...@gmail.com> wrote:

> 1. why not use the other columns as evidences and come up with a preference
> score  <UID> <ITEMID> <PREF_SCORE> and see if results improve.. May be you
> use  other machine learning algorithms to generate such preference
> scores...
>
>
> 2. one other way may be to implement a custom similarity Score and not the
> ones that ships with mahout  where you can use this column values to decide
> on the similarity of the users. Kindly have a look at mahout in action.
> There is an example for dating recommendation. This problem of yours
> <USERID,ITEMID> can mapped back to the same problem mentioned. try
> implementing the similarity score using the other column values.
>
>
> May be some  expert in this area  can come up with a better solution...if i
> were you i would certainly test the waters like the way i mentioned...
>
> Parnab...
> CSE, IIT Kharagur
> BIS, University College Cork
>

Re: User based recommender

Posted by parnab kumar <pa...@gmail.com>.

1. why not use the other columns as evidences and come up with a preference
score  <UID> <ITEMID> <PREF_SCORE> and see if results improve.. May be you
use  other machine learning algorithms to generate such preference
scores...


2. one other way may be to implement a custom similarity Score and not the
ones that ships with mahout  where you can use this column values to decide
on the similarity of the users. Kindly have a look at mahout in action.
There is an example for dating recommendation. This problem of yours
<USERID,ITEMID> can mapped back to the same problem mentioned. try
implementing the similarity score using the other column values.


May be some  expert in this area  can come up with a better solution...if i
were you i would certainly test the waters like the way i mentioned...

Parnab...
CSE, IIT Kharagur
BIS, University College Cork

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

User1 purchases = infant car seat, infant stroller
User2 purchases = infant car seat, infant stroller, infant crib mobile

The obvious recommendation for User1 is an infant crib mobile. From the purchase history the users look similar. Here similarity is in “taste”. User or item information that does not relate to taste may be misleading for recs. If you look at their profiles:

User1: male, 55 years old, upper 75% income
User2: female, 29 years old, lower 25% income

User1 is actually a doting grandfather, User2 a doting mother. Their profiles are quite dissimilar though their taste is similar. 

The point being that those other pieces of data may not relate to user similarity *of taste*. Going through the cross-recommendation process applies cooccurrence analysis to the data that checks to see if the secondary data correlates in an important way with the action you know is important.

For this reason it’s usually best to start out ignoring that information and using just <UID> <ITEMID> for the important action.

Later you may find uses for the extra data, or may consider viewing or purchasing from a certain category as a secondary action and use cross-recommendations to improve things.

On Dec 4, 2014, at 7:17 AM, Yash Patel <ya...@gmail.com> wrote:

Cross Recommendors dont seem applicable because this dataset doesn't
represent different actions by a user,it just contains transaction
history.(ie.customer id,item id,shipping location,sales amount of that
item,item category etc)

Maybe location,sales per item(similarity might lead to knowledge of people
who share same purchasing patterns) etc.

On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com> wrote:

> On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
> wrote:
> 
>> I have multiple different columns such as category,shipping location,item
>> price,online user, etc.
>> 
>> How can i use all these different columns and improve recommendation
>> quality(ie.calculate more precise similarity between users by use of
>> location,item price) ?
>> 
> 
> For some kinds of information, you can build cross recommenders off of that
> other information.  That incorporates this other information in an
> item-based system.
> 
> Simply hand coding a similarity usually doesn't work well.  The problem is
> that you don't really know which factors really represent actionable and
> non-redundant user similarity.
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

BTW you may be able to just run the same csv through multiple times and pick a different item-ID column for each “action”.  BTW here “csv” means a text file with some delimeter, not the full spec csv with headers, quoted values, and escaped characters.

On Dec 8, 2014, at 4:11 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

No classifier, just turn the one csv into several, each being a collection for one action.

user ID,item ID

Where the item ID is whatever the action corresponds too. For instance a <user ID>,<location ID> for being at a location or <user ID>,<item ID> for a purchase etc. These can go directly into the command line of spark-itemsimilarity. --input will always be the file with purchase, --input2 will be the file with the secondary action. 

On Dec 8, 2014, at 1:22 AM, Yash Patel <ya...@gmail.com> wrote:

most columns have different values,when you say preprocess do you mean
using classifiers ?

my dataset is highly structured in nature so i dont understand how a
classifier will work.
On Dec 8, 2014 2:20 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:

> If there is some “filter” column that flags one type of item or another
> then yes. Otherwise you’ll have to preprocess your data for input.
> 
> On Dec 7, 2014, at 2:27 PM, Yash Patel <ya...@gmail.com> wrote:
> 
> Will cross recommendation still work considering item similarity checks
> multiple columns for items and my dataset has only one column for items;it
> contains different item ids.
> 
> 
> 
> 
> On Sun, Dec 7, 2014 at 5:26 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> To use cross-recommendations with multiple actions you may be able to get
>> away with using the pre-packaged command line job “spark-itemsimilarity".
>> At one point you said you were more interested in the Mahout Hadoop
>> Mapreduce recommender, which cannot create these cross-recommendations.
>> 
>> I don’t see any need to use the interactive Mahout or Spark shell.
> Calling
>> Scala from Java is pretty complex so I’d recommend starting from the
>> running driver so you have a base of Scala code to start from. Calling
> Java
>> from Scala is dead simple, it’s done throughout Mahout code. This should
>> help make Scala a little less daunting. I use IntelliJ and there should
> be
>> no problem using Eclipse in the same manner.
>> 
>> 
>> On Dec 6, 2014, at 3:55 PM, Yash Patel <ya...@gmail.com> wrote:
>> 
>> i have something that shows the user locations,however is it possible to
>> implement this without using apache spark shell as i found it quite
>> confusing to use without no examples.
>> 
>> I have a windows environment and i am using java in eclipse luna to code
>> the recommender.
>> On Dec 6, 2014 9:09 PM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
>> 
>>> You can often think of or re-phase a piece of data (a column in your
>>> interaction data) as an action, like “being at a location”. Then use
>>> cross-cooccurrence to calculate a cross-indicator. So the location can
> be
>>> used to recommend purchases.
>>> 
>>> If you do this, the location should be something that can have
>>> cooccurrence, so instead of lat-lon some part of an address. Maybe
>>> country+postal-code would be good. Something unique that identifies a
>>> location where other users can be.
>>> 
>>> 
>>> On Dec 5, 2014, at 11:10 AM, Ted Dunning <te...@gmail.com> wrote:
>>> 
>>> Cross recommendation can apply if you use the multiple kinds of columns
>> to
>>> impute actions relative to characteristics.  That is, people at this
>>> location buy this item.  Then when you do the actual query, the query
>>> contains detailed history of the person, but also recent location
>> history.
>>> 
>>> 
>>> 
>>> On Thu, Dec 4, 2014 at 7:17 AM, Yash Patel <ya...@gmail.com>
>>> wrote:
>>> 
>>>> Cross Recommendors dont seem applicable because this dataset doesn't
>>>> represent different actions by a user,it just contains transaction
>>>> history.(ie.customer id,item id,shipping location,sales amount of that
>>>> item,item category etc)
>>>> 
>>>> Maybe location,sales per item(similarity might lead to knowledge of
>>> people
>>>> who share same purchasing patterns) etc.
>>>> 
>>>> 
>>>> On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com>
>>> wrote:
>>>> 
>>>>> On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> I have multiple different columns such as category,shipping
>>>> location,item
>>>>>> price,online user, etc.
>>>>>> 
>>>>>> How can i use all these different columns and improve recommendation
>>>>>> quality(ie.calculate more precise similarity between users by use of
>>>>>> location,item price) ?
>>>>>> 
>>>>> 
>>>>> For some kinds of information, you can build cross recommenders off of
>>>> that
>>>>> other information.  That incorporates this other information in an
>>>>> item-based system.
>>>>> 
>>>>> Simply hand coding a similarity usually doesn't work well.  The
> problem
>>>> is
>>>>> that you don't really know which factors really represent actionable
>> and
>>>>> non-redundant user similarity.
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

No classifier, just turn the one csv into several, each being a collection for one action.

user ID,item ID

Where the item ID is whatever the action corresponds too. For instance a <user ID>,<location ID> for being at a location or <user ID>,<item ID> for a purchase etc. These can go directly into the command line of spark-itemsimilarity. --input will always be the file with purchase, --input2 will be the file with the secondary action. 

On Dec 8, 2014, at 1:22 AM, Yash Patel <ya...@gmail.com> wrote:

most columns have different values,when you say preprocess do you mean
using classifiers ?

my dataset is highly structured in nature so i dont understand how a
classifier will work.
On Dec 8, 2014 2:20 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:

> If there is some “filter” column that flags one type of item or another
> then yes. Otherwise you’ll have to preprocess your data for input.
> 
> On Dec 7, 2014, at 2:27 PM, Yash Patel <ya...@gmail.com> wrote:
> 
> Will cross recommendation still work considering item similarity checks
> multiple columns for items and my dataset has only one column for items;it
> contains different item ids.
> 
> 
> 
> 
> On Sun, Dec 7, 2014 at 5:26 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> To use cross-recommendations with multiple actions you may be able to get
>> away with using the pre-packaged command line job “spark-itemsimilarity".
>> At one point you said you were more interested in the Mahout Hadoop
>> Mapreduce recommender, which cannot create these cross-recommendations.
>> 
>> I don’t see any need to use the interactive Mahout or Spark shell.
> Calling
>> Scala from Java is pretty complex so I’d recommend starting from the
>> running driver so you have a base of Scala code to start from. Calling
> Java
>> from Scala is dead simple, it’s done throughout Mahout code. This should
>> help make Scala a little less daunting. I use IntelliJ and there should
> be
>> no problem using Eclipse in the same manner.
>> 
>> 
>> On Dec 6, 2014, at 3:55 PM, Yash Patel <ya...@gmail.com> wrote:
>> 
>> i have something that shows the user locations,however is it possible to
>> implement this without using apache spark shell as i found it quite
>> confusing to use without no examples.
>> 
>> I have a windows environment and i am using java in eclipse luna to code
>> the recommender.
>> On Dec 6, 2014 9:09 PM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
>> 
>>> You can often think of or re-phase a piece of data (a column in your
>>> interaction data) as an action, like “being at a location”. Then use
>>> cross-cooccurrence to calculate a cross-indicator. So the location can
> be
>>> used to recommend purchases.
>>> 
>>> If you do this, the location should be something that can have
>>> cooccurrence, so instead of lat-lon some part of an address. Maybe
>>> country+postal-code would be good. Something unique that identifies a
>>> location where other users can be.
>>> 
>>> 
>>> On Dec 5, 2014, at 11:10 AM, Ted Dunning <te...@gmail.com> wrote:
>>> 
>>> Cross recommendation can apply if you use the multiple kinds of columns
>> to
>>> impute actions relative to characteristics.  That is, people at this
>>> location buy this item.  Then when you do the actual query, the query
>>> contains detailed history of the person, but also recent location
>> history.
>>> 
>>> 
>>> 
>>> On Thu, Dec 4, 2014 at 7:17 AM, Yash Patel <ya...@gmail.com>
>>> wrote:
>>> 
>>>> Cross Recommendors dont seem applicable because this dataset doesn't
>>>> represent different actions by a user,it just contains transaction
>>>> history.(ie.customer id,item id,shipping location,sales amount of that
>>>> item,item category etc)
>>>> 
>>>> Maybe location,sales per item(similarity might lead to knowledge of
>>> people
>>>> who share same purchasing patterns) etc.
>>>> 
>>>> 
>>>> On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com>
>>> wrote:
>>>> 
>>>>> On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> I have multiple different columns such as category,shipping
>>>> location,item
>>>>>> price,online user, etc.
>>>>>> 
>>>>>> How can i use all these different columns and improve recommendation
>>>>>> quality(ie.calculate more precise similarity between users by use of
>>>>>> location,item price) ?
>>>>>> 
>>>>> 
>>>>> For some kinds of information, you can build cross recommenders off of
>>>> that
>>>>> other information.  That incorporates this other information in an
>>>>> item-based system.
>>>>> 
>>>>> Simply hand coding a similarity usually doesn't work well.  The
> problem
>>>> is
>>>>> that you don't really know which factors really represent actionable
>> and
>>>>> non-redundant user similarity.
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

most columns have different values,when you say preprocess do you mean
using classifiers ?

my dataset is highly structured in nature so i dont understand how a
classifier will work.
 On Dec 8, 2014 2:20 AM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:

> If there is some “filter” column that flags one type of item or another
> then yes. Otherwise you’ll have to preprocess your data for input.
>
> On Dec 7, 2014, at 2:27 PM, Yash Patel <ya...@gmail.com> wrote:
>
> Will cross recommendation still work considering item similarity checks
> multiple columns for items and my dataset has only one column for items;it
> contains different item ids.
>
>
>
>
> On Sun, Dec 7, 2014 at 5:26 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > To use cross-recommendations with multiple actions you may be able to get
> > away with using the pre-packaged command line job “spark-itemsimilarity".
> > At one point you said you were more interested in the Mahout Hadoop
> > Mapreduce recommender, which cannot create these cross-recommendations.
> >
> > I don’t see any need to use the interactive Mahout or Spark shell.
> Calling
> > Scala from Java is pretty complex so I’d recommend starting from the
> > running driver so you have a base of Scala code to start from. Calling
> Java
> > from Scala is dead simple, it’s done throughout Mahout code. This should
> > help make Scala a little less daunting. I use IntelliJ and there should
> be
> > no problem using Eclipse in the same manner.
> >
> >
> > On Dec 6, 2014, at 3:55 PM, Yash Patel <ya...@gmail.com> wrote:
> >
> > i have something that shows the user locations,however is it possible to
> > implement this without using apache spark shell as i found it quite
> > confusing to use without no examples.
> >
> > I have a windows environment and i am using java in eclipse luna to code
> > the recommender.
> > On Dec 6, 2014 9:09 PM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
> >
> >> You can often think of or re-phase a piece of data (a column in your
> >> interaction data) as an action, like “being at a location”. Then use
> >> cross-cooccurrence to calculate a cross-indicator. So the location can
> be
> >> used to recommend purchases.
> >>
> >> If you do this, the location should be something that can have
> >> cooccurrence, so instead of lat-lon some part of an address. Maybe
> >> country+postal-code would be good. Something unique that identifies a
> >> location where other users can be.
> >>
> >>
> >> On Dec 5, 2014, at 11:10 AM, Ted Dunning <te...@gmail.com> wrote:
> >>
> >> Cross recommendation can apply if you use the multiple kinds of columns
> > to
> >> impute actions relative to characteristics.  That is, people at this
> >> location buy this item.  Then when you do the actual query, the query
> >> contains detailed history of the person, but also recent location
> > history.
> >>
> >>
> >>
> >> On Thu, Dec 4, 2014 at 7:17 AM, Yash Patel <ya...@gmail.com>
> >> wrote:
> >>
> >>> Cross Recommendors dont seem applicable because this dataset doesn't
> >>> represent different actions by a user,it just contains transaction
> >>> history.(ie.customer id,item id,shipping location,sales amount of that
> >>> item,item category etc)
> >>>
> >>> Maybe location,sales per item(similarity might lead to knowledge of
> >> people
> >>> who share same purchasing patterns) etc.
> >>>
> >>>
> >>> On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >>>
> >>>> On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> I have multiple different columns such as category,shipping
> >>> location,item
> >>>>> price,online user, etc.
> >>>>>
> >>>>> How can i use all these different columns and improve recommendation
> >>>>> quality(ie.calculate more precise similarity between users by use of
> >>>>> location,item price) ?
> >>>>>
> >>>>
> >>>> For some kinds of information, you can build cross recommenders off of
> >>> that
> >>>> other information.  That incorporates this other information in an
> >>>> item-based system.
> >>>>
> >>>> Simply hand coding a similarity usually doesn't work well.  The
> problem
> >>> is
> >>>> that you don't really know which factors really represent actionable
> > and
> >>>> non-redundant user similarity.
> >>>>
> >>>
> >>
> >>
> >
> >
>
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

If there is some “filter” column that flags one type of item or another then yes. Otherwise you’ll have to preprocess your data for input.

On Dec 7, 2014, at 2:27 PM, Yash Patel <ya...@gmail.com> wrote:

Will cross recommendation still work considering item similarity checks
multiple columns for items and my dataset has only one column for items;it
contains different item ids.




On Sun, Dec 7, 2014 at 5:26 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> To use cross-recommendations with multiple actions you may be able to get
> away with using the pre-packaged command line job “spark-itemsimilarity".
> At one point you said you were more interested in the Mahout Hadoop
> Mapreduce recommender, which cannot create these cross-recommendations.
> 
> I don’t see any need to use the interactive Mahout or Spark shell. Calling
> Scala from Java is pretty complex so I’d recommend starting from the
> running driver so you have a base of Scala code to start from. Calling Java
> from Scala is dead simple, it’s done throughout Mahout code. This should
> help make Scala a little less daunting. I use IntelliJ and there should be
> no problem using Eclipse in the same manner.
> 
> 
> On Dec 6, 2014, at 3:55 PM, Yash Patel <ya...@gmail.com> wrote:
> 
> i have something that shows the user locations,however is it possible to
> implement this without using apache spark shell as i found it quite
> confusing to use without no examples.
> 
> I have a windows environment and i am using java in eclipse luna to code
> the recommender.
> On Dec 6, 2014 9:09 PM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
> 
>> You can often think of or re-phase a piece of data (a column in your
>> interaction data) as an action, like “being at a location”. Then use
>> cross-cooccurrence to calculate a cross-indicator. So the location can be
>> used to recommend purchases.
>> 
>> If you do this, the location should be something that can have
>> cooccurrence, so instead of lat-lon some part of an address. Maybe
>> country+postal-code would be good. Something unique that identifies a
>> location where other users can be.
>> 
>> 
>> On Dec 5, 2014, at 11:10 AM, Ted Dunning <te...@gmail.com> wrote:
>> 
>> Cross recommendation can apply if you use the multiple kinds of columns
> to
>> impute actions relative to characteristics.  That is, people at this
>> location buy this item.  Then when you do the actual query, the query
>> contains detailed history of the person, but also recent location
> history.
>> 
>> 
>> 
>> On Thu, Dec 4, 2014 at 7:17 AM, Yash Patel <ya...@gmail.com>
>> wrote:
>> 
>>> Cross Recommendors dont seem applicable because this dataset doesn't
>>> represent different actions by a user,it just contains transaction
>>> history.(ie.customer id,item id,shipping location,sales amount of that
>>> item,item category etc)
>>> 
>>> Maybe location,sales per item(similarity might lead to knowledge of
>> people
>>> who share same purchasing patterns) etc.
>>> 
>>> 
>>> On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>> 
>>>> On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
>>>> wrote:
>>>> 
>>>>> I have multiple different columns such as category,shipping
>>> location,item
>>>>> price,online user, etc.
>>>>> 
>>>>> How can i use all these different columns and improve recommendation
>>>>> quality(ie.calculate more precise similarity between users by use of
>>>>> location,item price) ?
>>>>> 
>>>> 
>>>> For some kinds of information, you can build cross recommenders off of
>>> that
>>>> other information.  That incorporates this other information in an
>>>> item-based system.
>>>> 
>>>> Simply hand coding a similarity usually doesn't work well.  The problem
>>> is
>>>> that you don't really know which factors really represent actionable
> and
>>>> non-redundant user similarity.
>>>> 
>>> 
>> 
>> 
> 
>

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

Will cross recommendation still work considering item similarity checks
multiple columns for items and my dataset has only one column for items;it
contains different item ids.




On Sun, Dec 7, 2014 at 5:26 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> To use cross-recommendations with multiple actions you may be able to get
> away with using the pre-packaged command line job “spark-itemsimilarity".
> At one point you said you were more interested in the Mahout Hadoop
> Mapreduce recommender, which cannot create these cross-recommendations.
>
> I don’t see any need to use the interactive Mahout or Spark shell. Calling
> Scala from Java is pretty complex so I’d recommend starting from the
> running driver so you have a base of Scala code to start from. Calling Java
> from Scala is dead simple, it’s done throughout Mahout code. This should
> help make Scala a little less daunting. I use IntelliJ and there should be
> no problem using Eclipse in the same manner.
>
>
> On Dec 6, 2014, at 3:55 PM, Yash Patel <ya...@gmail.com> wrote:
>
> i have something that shows the user locations,however is it possible to
> implement this without using apache spark shell as i found it quite
> confusing to use without no examples.
>
> I have a windows environment and i am using java in eclipse luna to code
> the recommender.
> On Dec 6, 2014 9:09 PM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:
>
> > You can often think of or re-phase a piece of data (a column in your
> > interaction data) as an action, like “being at a location”. Then use
> > cross-cooccurrence to calculate a cross-indicator. So the location can be
> > used to recommend purchases.
> >
> > If you do this, the location should be something that can have
> > cooccurrence, so instead of lat-lon some part of an address. Maybe
> > country+postal-code would be good. Something unique that identifies a
> > location where other users can be.
> >
> >
> > On Dec 5, 2014, at 11:10 AM, Ted Dunning <te...@gmail.com> wrote:
> >
> > Cross recommendation can apply if you use the multiple kinds of columns
> to
> > impute actions relative to characteristics.  That is, people at this
> > location buy this item.  Then when you do the actual query, the query
> > contains detailed history of the person, but also recent location
> history.
> >
> >
> >
> > On Thu, Dec 4, 2014 at 7:17 AM, Yash Patel <ya...@gmail.com>
> > wrote:
> >
> >> Cross Recommendors dont seem applicable because this dataset doesn't
> >> represent different actions by a user,it just contains transaction
> >> history.(ie.customer id,item id,shipping location,sales amount of that
> >> item,item category etc)
> >>
> >> Maybe location,sales per item(similarity might lead to knowledge of
> > people
> >> who share same purchasing patterns) etc.
> >>
> >>
> >> On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> >>
> >>> On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
> >>> wrote:
> >>>
> >>>> I have multiple different columns such as category,shipping
> >> location,item
> >>>> price,online user, etc.
> >>>>
> >>>> How can i use all these different columns and improve recommendation
> >>>> quality(ie.calculate more precise similarity between users by use of
> >>>> location,item price) ?
> >>>>
> >>>
> >>> For some kinds of information, you can build cross recommenders off of
> >> that
> >>> other information.  That incorporates this other information in an
> >>> item-based system.
> >>>
> >>> Simply hand coding a similarity usually doesn't work well.  The problem
> >> is
> >>> that you don't really know which factors really represent actionable
> and
> >>> non-redundant user similarity.
> >>>
> >>
> >
> >
>
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

To use cross-recommendations with multiple actions you may be able to get away with using the pre-packaged command line job “spark-itemsimilarity". At one point you said you were more interested in the Mahout Hadoop Mapreduce recommender, which cannot create these cross-recommendations.

I don’t see any need to use the interactive Mahout or Spark shell. Calling Scala from Java is pretty complex so I’d recommend starting from the running driver so you have a base of Scala code to start from. Calling Java from Scala is dead simple, it’s done throughout Mahout code. This should help make Scala a little less daunting. I use IntelliJ and there should be no problem using Eclipse in the same manner. 

On Dec 6, 2014, at 3:55 PM, Yash Patel <ya...@gmail.com> wrote:

i have something that shows the user locations,however is it possible to
implement this without using apache spark shell as i found it quite
confusing to use without no examples.

I have a windows environment and i am using java in eclipse luna to code
the recommender.
On Dec 6, 2014 9:09 PM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:

> You can often think of or re-phase a piece of data (a column in your
> interaction data) as an action, like “being at a location”. Then use
> cross-cooccurrence to calculate a cross-indicator. So the location can be
> used to recommend purchases.
> 
> If you do this, the location should be something that can have
> cooccurrence, so instead of lat-lon some part of an address. Maybe
> country+postal-code would be good. Something unique that identifies a
> location where other users can be.
> 
> 
> On Dec 5, 2014, at 11:10 AM, Ted Dunning <te...@gmail.com> wrote:
> 
> Cross recommendation can apply if you use the multiple kinds of columns to
> impute actions relative to characteristics.  That is, people at this
> location buy this item.  Then when you do the actual query, the query
> contains detailed history of the person, but also recent location history.
> 
> 
> 
> On Thu, Dec 4, 2014 at 7:17 AM, Yash Patel <ya...@gmail.com>
> wrote:
> 
>> Cross Recommendors dont seem applicable because this dataset doesn't
>> represent different actions by a user,it just contains transaction
>> history.(ie.customer id,item id,shipping location,sales amount of that
>> item,item category etc)
>> 
>> Maybe location,sales per item(similarity might lead to knowledge of
> people
>> who share same purchasing patterns) etc.
>> 
>> 
>> On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com>
> wrote:
>> 
>>> On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
>>> wrote:
>>> 
>>>> I have multiple different columns such as category,shipping
>> location,item
>>>> price,online user, etc.
>>>> 
>>>> How can i use all these different columns and improve recommendation
>>>> quality(ie.calculate more precise similarity between users by use of
>>>> location,item price) ?
>>>> 
>>> 
>>> For some kinds of information, you can build cross recommenders off of
>> that
>>> other information.  That incorporates this other information in an
>>> item-based system.
>>> 
>>> Simply hand coding a similarity usually doesn't work well.  The problem
>> is
>>> that you don't really know which factors really represent actionable and
>>> non-redundant user similarity.
>>> 
>> 
> 
>

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

i have something that shows the user locations,however is it possible to
implement this without using apache spark shell as i found it quite
confusing to use without no examples.

I have a windows environment and i am using java in eclipse luna to code
the recommender.
On Dec 6, 2014 9:09 PM, "Pat Ferrel" <pa...@occamsmachete.com> wrote:

> You can often think of or re-phase a piece of data (a column in your
> interaction data) as an action, like “being at a location”. Then use
> cross-cooccurrence to calculate a cross-indicator. So the location can be
> used to recommend purchases.
>
> If you do this, the location should be something that can have
> cooccurrence, so instead of lat-lon some part of an address. Maybe
> country+postal-code would be good. Something unique that identifies a
> location where other users can be.
>
>
> On Dec 5, 2014, at 11:10 AM, Ted Dunning <te...@gmail.com> wrote:
>
> Cross recommendation can apply if you use the multiple kinds of columns to
> impute actions relative to characteristics.  That is, people at this
> location buy this item.  Then when you do the actual query, the query
> contains detailed history of the person, but also recent location history.
>
>
>
> On Thu, Dec 4, 2014 at 7:17 AM, Yash Patel <ya...@gmail.com>
> wrote:
>
> > Cross Recommendors dont seem applicable because this dataset doesn't
> > represent different actions by a user,it just contains transaction
> > history.(ie.customer id,item id,shipping location,sales amount of that
> > item,item category etc)
> >
> > Maybe location,sales per item(similarity might lead to knowledge of
> people
> > who share same purchasing patterns) etc.
> >
> >
> > On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> >> On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
> >> wrote:
> >>
> >>> I have multiple different columns such as category,shipping
> > location,item
> >>> price,online user, etc.
> >>>
> >>> How can i use all these different columns and improve recommendation
> >>> quality(ie.calculate more precise similarity between users by use of
> >>> location,item price) ?
> >>>
> >>
> >> For some kinds of information, you can build cross recommenders off of
> > that
> >> other information.  That incorporates this other information in an
> >> item-based system.
> >>
> >> Simply hand coding a similarity usually doesn't work well.  The problem
> > is
> >> that you don't really know which factors really represent actionable and
> >> non-redundant user similarity.
> >>
> >
>
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

You can often think of or re-phase a piece of data (a column in your interaction data) as an action, like “being at a location”. Then use cross-cooccurrence to calculate a cross-indicator. So the location can be used to recommend purchases.

If you do this, the location should be something that can have cooccurrence, so instead of lat-lon some part of an address. Maybe country+postal-code would be good. Something unique that identifies a location where other users can be. 

On Dec 5, 2014, at 11:10 AM, Ted Dunning <te...@gmail.com> wrote:

Cross recommendation can apply if you use the multiple kinds of columns to
impute actions relative to characteristics.  That is, people at this
location buy this item.  Then when you do the actual query, the query
contains detailed history of the person, but also recent location history.

On Thu, Dec 4, 2014 at 7:17 AM, Yash Patel <ya...@gmail.com> wrote:

> Cross Recommendors dont seem applicable because this dataset doesn't
> represent different actions by a user,it just contains transaction
> history.(ie.customer id,item id,shipping location,sales amount of that
> item,item category etc)
> 
> Maybe location,sales per item(similarity might lead to knowledge of people
> who share same purchasing patterns) etc.
> 
> 
> On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com> wrote:
> 
>> On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
>> wrote:
>> 
>>> I have multiple different columns such as category,shipping
> location,item
>>> price,online user, etc.
>>> 
>>> How can i use all these different columns and improve recommendation
>>> quality(ie.calculate more precise similarity between users by use of
>>> location,item price) ?
>>> 
>> 
>> For some kinds of information, you can build cross recommenders off of
> that
>> other information.  That incorporates this other information in an
>> item-based system.
>> 
>> Simply hand coding a similarity usually doesn't work well.  The problem
> is
>> that you don't really know which factors really represent actionable and
>> non-redundant user similarity.
>> 
>

Re: User based recommender

Posted by Ted Dunning <te...@gmail.com>.

Cross recommendation can apply if you use the multiple kinds of columns to
impute actions relative to characteristics.  That is, people at this
location buy this item.  Then when you do the actual query, the query
contains detailed history of the person, but also recent location history.



On Thu, Dec 4, 2014 at 7:17 AM, Yash Patel <ya...@gmail.com> wrote:

> Cross Recommendors dont seem applicable because this dataset doesn't
> represent different actions by a user,it just contains transaction
> history.(ie.customer id,item id,shipping location,sales amount of that
> item,item category etc)
>
> Maybe location,sales per item(similarity might lead to knowledge of people
> who share same purchasing patterns) etc.
>
>
> On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
> > wrote:
> >
> > > I have multiple different columns such as category,shipping
> location,item
> > > price,online user, etc.
> > >
> > > How can i use all these different columns and improve recommendation
> > > quality(ie.calculate more precise similarity between users by use of
> > > location,item price) ?
> > >
> >
> > For some kinds of information, you can build cross recommenders off of
> that
> > other information.  That incorporates this other information in an
> > item-based system.
> >
> > Simply hand coding a similarity usually doesn't work well.  The problem
> is
> > that you don't really know which factors really represent actionable and
> > non-redundant user similarity.
> >
>

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

Cross Recommendors dont seem applicable because this dataset doesn't
represent different actions by a user,it just contains transaction
history.(ie.customer id,item id,shipping location,sales amount of that
item,item category etc)

Maybe location,sales per item(similarity might lead to knowledge of people
who share same purchasing patterns) etc.


On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning <te...@gmail.com> wrote:

> On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com>
> wrote:
>
> > I have multiple different columns such as category,shipping location,item
> > price,online user, etc.
> >
> > How can i use all these different columns and improve recommendation
> > quality(ie.calculate more precise similarity between users by use of
> > location,item price) ?
> >
>
> For some kinds of information, you can build cross recommenders off of that
> other information.  That incorporates this other information in an
> item-based system.
>
> Simply hand coding a similarity usually doesn't work well.  The problem is
> that you don't really know which factors really represent actionable and
> non-redundant user similarity.
>

Re: User based recommender

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel <ya...@gmail.com> wrote:

> I have multiple different columns such as category,shipping location,item
> price,online user, etc.
>
> How can i use all these different columns and improve recommendation
> quality(ie.calculate more precise similarity between users by use of
> location,item price) ?
>

For some kinds of information, you can build cross recommenders off of that
other information.  That incorporates this other information in an
item-based system.

Simply hand coding a similarity usually doesn't work well.  The problem is
that you don't really know which factors really represent actionable and
non-redundant user similarity.

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

I figured out how to parse csv files and use a map of Userid,item id and
build a normal recommender,which gives user a recommendation of some items.

Although this method isn't able to utilize all my data considering its only
using two columns.

I have multiple different columns such as category,shipping location,item
price,online user, etc.

How can i use all these different columns and improve recommendation
quality(ie.calculate more precise similarity between users by use of
location,item price) ?

Best Regards,
Yash Patel



On Sat, Nov 29, 2014 at 10:47 PM, Yash Patel <ya...@gmail.com>
wrote:

> Thank you for the guidance.
>
> I will try building something rough and ask questions if i run into any
> errors.
>
>
>
>
> On Sat, Nov 29, 2014 at 10:38 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>
>> The Mahout site is a good starting point for using any of the
>> recommenders.
>>
>> http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html
>>
>> On Nov 29, 2014, at 1:33 PM, Yash Patel <ya...@gmail.com> wrote:
>>
>> Can you give me some more details on the Hadoop mapreduce item-based
>> cooccurrence recommender.
>>
>>
>> Best Regards,
>> Yash Patel
>>
>> On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <pa...@occamsmachete.com>
>> wrote:
>>
>> > I built this app with it: https://guide.finderbots.com
>> >
>> > The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
>> > out of the job it is csv text—therefore language and architecture
>> neutral.
>> > I load the data from spark-itemsimilarity into MongoDB using java. Solr
>> is
>> > set up for full-text indexing and queries using data from MongoDB. The
>> > queries are made to Solr through REST from Ruby UX code. You can replace
>> > any component in this stack with whatever you wish and use whatever
>> > language you are comfortable with.
>> >
>> > Alternatively you could modify the UI of Solr or Elasticsearch—both are
>> in
>> > Java.
>> >
>> > If you use any of the other Mahout recommenders they create all recs for
>> > all known users so you’ll still need to build a way to serve those
>> results.
>> > People often use DBs for this and integrate with their web app
>> framework.
>> >
>> > On Nov 28, 2014, at 10:03 AM, Yash Patel <ya...@gmail.com>
>> wrote:
>> >
>> > I looked up spark row similarity but i am not sure if it will suit my
>> needs
>> > as i want to build my recommender as a java application possibly with an
>> > interface.
>> >
>> >
>> > On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <pa...@occamsmachete.com>
>> wrote:
>> >
>> >> Some references:
>> >>
>> >> small free book here, which talks about the general idea:
>> >> https://www.mapr.com/practical-machine-learning
>> >> preso, which talks about mixing actions or other indicators:
>> >>
>> >
>> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
>> >> two blog posts:
>> >>
>> >
>> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
>> >>
>> >
>> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
>> >> mahout docs:
>> >>
>> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
>> >>
>> >> Build Mahout from this source: https://github.com/apache/mahout This
>> > will
>> >> run stand-alone on a dev machine, then if your data is too big for a
>> > single
>> >> machine you can run it on a Spark + Hadoop cluster. The data this
>> creates
>> >> can be put into a DB or indexed directly by a search engine (Solr or
>> >> Elasticsearch). Choose the search engine you want then queries of a
>> > user’s
>> >> item id history will go there--results will be an ordered list of item
>> > ids
>> >> to recommend.
>> >>
>> >> The core piece is the command line job: “mahout spark-itemsimilarity”,
>> >> which can parse csv data. The options specify what columns are used for
>> > ids.
>> >>
>> >> Start out simple by looking only at user and item IDs. Then you can add
>> >> other cross-cooccurrence indicators for multiple actions later pretty
>> >> easily.
>> >>
>> >>
>> >> On Nov 28, 2014, at 12:14 AM, Yash Patel <ya...@gmail.com>
>> > wrote:
>> >>
>> >> The mahout + search engine recommender seems what would be best for the
>> >> data i have.
>> >>
>> >> Kindly get back to me at your earliest convenience.
>> >>
>> >>
>> >>
>> >> Best Regards,
>> >> Yash Patel
>> >>
>> >> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <pa...@occamsmachete.com>
>> > wrote:
>> >>
>> >>> Mahout has several recommenders so no need to create one from
>> > components.
>> >>> They all make use of the similarity of preferences between
>> users—that’s
>> >> why
>> >>> they are in the category of collaborative filtering.
>> >>>
>> >>> Primary Mahout Recommenders:
>> >>> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
>> > recs
>> >>> for all users. Uses “Mahout IDs"
>> >>> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise
>> in
>> >>> the data. Sometimes better for small data sets than #1. Uses “Mahout
>> > IDs"
>> >>> 3) Mahout + search engine: cooccurrence type. Extremely flexible,
>> works
>> >>> with multiple actions (multi-modal), works for new users that have
>> some
>> >>> history, has a scalable server (from the search engine) but is more
>> >>> difficult to integrate than #1 or #2. Uses your own ids and reads csv
>> >> files.
>> >>>
>> >>> The rest of the data seems to apply either to the user or the item and
>> > so
>> >>> would be used in different ways. #1 an #2 can only use user id and
>> item
>> >> id
>> >>> but some post recommendation weighting or filtering can be applied. #3
>> >> can
>> >>> use multiple attributes in different ways. For instance if category is
>> > an
>> >>> item attribute you can create two actions, user-pref-for-an-item, and
>> >>> user-pref-for-a-category. Assuming you want to recommend an item (not
>> >>> category) you can create a cross-ccoccurrence indicator for the second
>> >>> action and use the data to make your item recs better. #3 is the only
>> >>> methods that supports this.
>> >>>
>> >>> Pick a recommender and we can help more with data prep.
>> >>>
>> >>>
>> >>> On Nov 26, 2014, at 1:34 PM, Yash Patel <ya...@gmail.com>
>> > wrote:
>> >>>
>> >>> Hello everyone,
>> >>>
>> >>> wow i am quite happy to see so many inputs from people.
>> >>>
>> >>> I apologize for not providing more details.
>> >>>
>> >>> Although this is not my complete dataset the fields i have chosen to
>> use
>> >>> are:
>> >>>
>> >>> customer id - numeric
>> >>> item id - text
>> >>> postal code - text
>> >>> item category ´- text
>> >>> potential growth - text
>> >>> territory - text
>> >>>
>> >>>
>> >>> Basically i was thinking of finding similar users and recommending
>> them
>> >>> items that users like them have bought but they haven't.
>> >>>
>> >>> Although i would very much like to hear your opinions as i am not so
>> >>> familiar with clustering,classifiers etc.
>> >>>
>> >>> I found that mahout takes sequence files converted into vectors but i
>> >>> couldn't understand how would i do it on my data specifically and more
>> >>> importantly make a recommender system out of it.
>> >>>
>> >>> Also i am wondering how to combine the importance of a specific
>> customer
>> >>> through the potential growth attribute.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Best Regards,
>> >>> Yash Patel
>> >>>
>> >>> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pa...@occamsmachete.com>
>> >> wrote:
>> >>>
>> >>>> All very good points but note that spark-itemsimilarity may take the
>> >>> input
>> >>>> directly since you specify column numbers for
>> <UID><ITEMID><PREF_VALUE>
>> >>>>
>> >>>> On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> kindly elaborate... your requirements... your dataset fields ...and
>> > what
>> >>>> you want to recommend to an user... Usually a set of item is
>> > recommended
>> >>> to
>> >>>> an user. In your case what are your items ?
>> >>>>
>> >>>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data
>> is
>> >>> not
>> >>>> in this format which will let you use directly the algorithms in
>> > Mahout.
>> >>>>
>> >>>> A little more info from your side will help us to give your the right
>> >>>> pointers.
>> >>>>
>> >>>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <yashpatel1230@gmail.com
>> >
>> >>>> wrote:
>> >>>>
>> >>>>> Dear Mahout Team,
>> >>>>>
>> >>>>> I am a student new to machine learning and i am trying to build a
>> user
>> >>>>> based recommender using mahout.
>> >>>>>
>> >>>>> My dataset is a csv file as an input but it has many fields as text
>> > and
>> >>> i
>> >>>>> understand mahout needs numeric values.
>> >>>>>
>> >>>>> Can you give me a headstart as to where i should start and what kind
>> > of
>> >>>>> tools i need to parse the text colummns,
>> >>>>>
>> >>>>> Also an idea on which classifiers or clustering methods i should use
>> >>>> would
>> >>>>> be highly appreciated.
>> >>>>>
>> >>>>>
>> >>>>> Best Regards;
>> >>>>> Yash Patel
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >>
>> >
>> >
>>
>>
>

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

Thank you for the guidance.

I will try building something rough and ask questions if i run into any
errors.




On Sat, Nov 29, 2014 at 10:38 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> The Mahout site is a good starting point for using any of the recommenders.
>
> http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html
>
> On Nov 29, 2014, at 1:33 PM, Yash Patel <ya...@gmail.com> wrote:
>
> Can you give me some more details on the Hadoop mapreduce item-based
> cooccurrence recommender.
>
>
> Best Regards,
> Yash Patel
>
> On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > I built this app with it: https://guide.finderbots.com
> >
> > The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
> > out of the job it is csv text—therefore language and architecture
> neutral.
> > I load the data from spark-itemsimilarity into MongoDB using java. Solr
> is
> > set up for full-text indexing and queries using data from MongoDB. The
> > queries are made to Solr through REST from Ruby UX code. You can replace
> > any component in this stack with whatever you wish and use whatever
> > language you are comfortable with.
> >
> > Alternatively you could modify the UI of Solr or Elasticsearch—both are
> in
> > Java.
> >
> > If you use any of the other Mahout recommenders they create all recs for
> > all known users so you’ll still need to build a way to serve those
> results.
> > People often use DBs for this and integrate with their web app framework.
> >
> > On Nov 28, 2014, at 10:03 AM, Yash Patel <ya...@gmail.com>
> wrote:
> >
> > I looked up spark row similarity but i am not sure if it will suit my
> needs
> > as i want to build my recommender as a java application possibly with an
> > interface.
> >
> >
> > On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
> >
> >> Some references:
> >>
> >> small free book here, which talks about the general idea:
> >> https://www.mapr.com/practical-machine-learning
> >> preso, which talks about mixing actions or other indicators:
> >>
> >
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> >> two blog posts:
> >>
> >
> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
> >>
> >
> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
> >> mahout docs:
> >>
> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
> >>
> >> Build Mahout from this source: https://github.com/apache/mahout This
> > will
> >> run stand-alone on a dev machine, then if your data is too big for a
> > single
> >> machine you can run it on a Spark + Hadoop cluster. The data this
> creates
> >> can be put into a DB or indexed directly by a search engine (Solr or
> >> Elasticsearch). Choose the search engine you want then queries of a
> > user’s
> >> item id history will go there--results will be an ordered list of item
> > ids
> >> to recommend.
> >>
> >> The core piece is the command line job: “mahout spark-itemsimilarity”,
> >> which can parse csv data. The options specify what columns are used for
> > ids.
> >>
> >> Start out simple by looking only at user and item IDs. Then you can add
> >> other cross-cooccurrence indicators for multiple actions later pretty
> >> easily.
> >>
> >>
> >> On Nov 28, 2014, at 12:14 AM, Yash Patel <ya...@gmail.com>
> > wrote:
> >>
> >> The mahout + search engine recommender seems what would be best for the
> >> data i have.
> >>
> >> Kindly get back to me at your earliest convenience.
> >>
> >>
> >>
> >> Best Regards,
> >> Yash Patel
> >>
> >> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <pa...@occamsmachete.com>
> > wrote:
> >>
> >>> Mahout has several recommenders so no need to create one from
> > components.
> >>> They all make use of the similarity of preferences between users—that’s
> >> why
> >>> they are in the category of collaborative filtering.
> >>>
> >>> Primary Mahout Recommenders:
> >>> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
> > recs
> >>> for all users. Uses “Mahout IDs"
> >>> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise
> in
> >>> the data. Sometimes better for small data sets than #1. Uses “Mahout
> > IDs"
> >>> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
> >>> with multiple actions (multi-modal), works for new users that have some
> >>> history, has a scalable server (from the search engine) but is more
> >>> difficult to integrate than #1 or #2. Uses your own ids and reads csv
> >> files.
> >>>
> >>> The rest of the data seems to apply either to the user or the item and
> > so
> >>> would be used in different ways. #1 an #2 can only use user id and item
> >> id
> >>> but some post recommendation weighting or filtering can be applied. #3
> >> can
> >>> use multiple attributes in different ways. For instance if category is
> > an
> >>> item attribute you can create two actions, user-pref-for-an-item, and
> >>> user-pref-for-a-category. Assuming you want to recommend an item (not
> >>> category) you can create a cross-ccoccurrence indicator for the second
> >>> action and use the data to make your item recs better. #3 is the only
> >>> methods that supports this.
> >>>
> >>> Pick a recommender and we can help more with data prep.
> >>>
> >>>
> >>> On Nov 26, 2014, at 1:34 PM, Yash Patel <ya...@gmail.com>
> > wrote:
> >>>
> >>> Hello everyone,
> >>>
> >>> wow i am quite happy to see so many inputs from people.
> >>>
> >>> I apologize for not providing more details.
> >>>
> >>> Although this is not my complete dataset the fields i have chosen to
> use
> >>> are:
> >>>
> >>> customer id - numeric
> >>> item id - text
> >>> postal code - text
> >>> item category ´- text
> >>> potential growth - text
> >>> territory - text
> >>>
> >>>
> >>> Basically i was thinking of finding similar users and recommending them
> >>> items that users like them have bought but they haven't.
> >>>
> >>> Although i would very much like to hear your opinions as i am not so
> >>> familiar with clustering,classifiers etc.
> >>>
> >>> I found that mahout takes sequence files converted into vectors but i
> >>> couldn't understand how would i do it on my data specifically and more
> >>> importantly make a recommender system out of it.
> >>>
> >>> Also i am wondering how to combine the importance of a specific
> customer
> >>> through the potential growth attribute.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Best Regards,
> >>> Yash Patel
> >>>
> >>> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pa...@occamsmachete.com>
> >> wrote:
> >>>
> >>>> All very good points but note that spark-itemsimilarity may take the
> >>> input
> >>>> directly since you specify column numbers for
> <UID><ITEMID><PREF_VALUE>
> >>>>
> >>>> On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com>
> >>> wrote:
> >>>>
> >>>> kindly elaborate... your requirements... your dataset fields ...and
> > what
> >>>> you want to recommend to an user... Usually a set of item is
> > recommended
> >>> to
> >>>> an user. In your case what are your items ?
> >>>>
> >>>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
> >>> not
> >>>> in this format which will let you use directly the algorithms in
> > Mahout.
> >>>>
> >>>> A little more info from your side will help us to give your the right
> >>>> pointers.
> >>>>
> >>>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Dear Mahout Team,
> >>>>>
> >>>>> I am a student new to machine learning and i am trying to build a
> user
> >>>>> based recommender using mahout.
> >>>>>
> >>>>> My dataset is a csv file as an input but it has many fields as text
> > and
> >>> i
> >>>>> understand mahout needs numeric values.
> >>>>>
> >>>>> Can you give me a headstart as to where i should start and what kind
> > of
> >>>>> tools i need to parse the text colummns,
> >>>>>
> >>>>> Also an idea on which classifiers or clustering methods i should use
> >>>> would
> >>>>> be highly appreciated.
> >>>>>
> >>>>>
> >>>>> Best Regards;
> >>>>> Yash Patel
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

The Mahout site is a good starting point for using any of the recommenders.

http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html

On Nov 29, 2014, at 1:33 PM, Yash Patel <ya...@gmail.com> wrote:

Can you give me some more details on the Hadoop mapreduce item-based
cooccurrence recommender.


Best Regards,
Yash Patel

On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I built this app with it: https://guide.finderbots.com
> 
> The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
> out of the job it is csv text—therefore language and architecture neutral.
> I load the data from spark-itemsimilarity into MongoDB using java. Solr is
> set up for full-text indexing and queries using data from MongoDB. The
> queries are made to Solr through REST from Ruby UX code. You can replace
> any component in this stack with whatever you wish and use whatever
> language you are comfortable with.
> 
> Alternatively you could modify the UI of Solr or Elasticsearch—both are in
> Java.
> 
> If you use any of the other Mahout recommenders they create all recs for
> all known users so you’ll still need to build a way to serve those results.
> People often use DBs for this and integrate with their web app framework.
> 
> On Nov 28, 2014, at 10:03 AM, Yash Patel <ya...@gmail.com> wrote:
> 
> I looked up spark row similarity but i am not sure if it will suit my needs
> as i want to build my recommender as a java application possibly with an
> interface.
> 
> 
> On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> Some references:
>> 
>> small free book here, which talks about the general idea:
>> https://www.mapr.com/practical-machine-learning
>> preso, which talks about mixing actions or other indicators:
>> 
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
>> two blog posts:
>> 
> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
>> 
> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
>> mahout docs:
>> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
>> 
>> Build Mahout from this source: https://github.com/apache/mahout This
> will
>> run stand-alone on a dev machine, then if your data is too big for a
> single
>> machine you can run it on a Spark + Hadoop cluster. The data this creates
>> can be put into a DB or indexed directly by a search engine (Solr or
>> Elasticsearch). Choose the search engine you want then queries of a
> user’s
>> item id history will go there--results will be an ordered list of item
> ids
>> to recommend.
>> 
>> The core piece is the command line job: “mahout spark-itemsimilarity”,
>> which can parse csv data. The options specify what columns are used for
> ids.
>> 
>> Start out simple by looking only at user and item IDs. Then you can add
>> other cross-cooccurrence indicators for multiple actions later pretty
>> easily.
>> 
>> 
>> On Nov 28, 2014, at 12:14 AM, Yash Patel <ya...@gmail.com>
> wrote:
>> 
>> The mahout + search engine recommender seems what would be best for the
>> data i have.
>> 
>> Kindly get back to me at your earliest convenience.
>> 
>> 
>> 
>> Best Regards,
>> Yash Patel
>> 
>> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>> 
>>> Mahout has several recommenders so no need to create one from
> components.
>>> They all make use of the similarity of preferences between users—that’s
>> why
>>> they are in the category of collaborative filtering.
>>> 
>>> Primary Mahout Recommenders:
>>> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
> recs
>>> for all users. Uses “Mahout IDs"
>>> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
>>> the data. Sometimes better for small data sets than #1. Uses “Mahout
> IDs"
>>> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
>>> with multiple actions (multi-modal), works for new users that have some
>>> history, has a scalable server (from the search engine) but is more
>>> difficult to integrate than #1 or #2. Uses your own ids and reads csv
>> files.
>>> 
>>> The rest of the data seems to apply either to the user or the item and
> so
>>> would be used in different ways. #1 an #2 can only use user id and item
>> id
>>> but some post recommendation weighting or filtering can be applied. #3
>> can
>>> use multiple attributes in different ways. For instance if category is
> an
>>> item attribute you can create two actions, user-pref-for-an-item, and
>>> user-pref-for-a-category. Assuming you want to recommend an item (not
>>> category) you can create a cross-ccoccurrence indicator for the second
>>> action and use the data to make your item recs better. #3 is the only
>>> methods that supports this.
>>> 
>>> Pick a recommender and we can help more with data prep.
>>> 
>>> 
>>> On Nov 26, 2014, at 1:34 PM, Yash Patel <ya...@gmail.com>
> wrote:
>>> 
>>> Hello everyone,
>>> 
>>> wow i am quite happy to see so many inputs from people.
>>> 
>>> I apologize for not providing more details.
>>> 
>>> Although this is not my complete dataset the fields i have chosen to use
>>> are:
>>> 
>>> customer id - numeric
>>> item id - text
>>> postal code - text
>>> item category ´- text
>>> potential growth - text
>>> territory - text
>>> 
>>> 
>>> Basically i was thinking of finding similar users and recommending them
>>> items that users like them have bought but they haven't.
>>> 
>>> Although i would very much like to hear your opinions as i am not so
>>> familiar with clustering,classifiers etc.
>>> 
>>> I found that mahout takes sequence files converted into vectors but i
>>> couldn't understand how would i do it on my data specifically and more
>>> importantly make a recommender system out of it.
>>> 
>>> Also i am wondering how to combine the importance of a specific customer
>>> through the potential growth attribute.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Best Regards,
>>> Yash Patel
>>> 
>>> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pa...@occamsmachete.com>
>> wrote:
>>> 
>>>> All very good points but note that spark-itemsimilarity may take the
>>> input
>>>> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
>>>> 
>>>> On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com>
>>> wrote:
>>>> 
>>>> kindly elaborate... your requirements... your dataset fields ...and
> what
>>>> you want to recommend to an user... Usually a set of item is
> recommended
>>> to
>>>> an user. In your case what are your items ?
>>>> 
>>>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
>>> not
>>>> in this format which will let you use directly the algorithms in
> Mahout.
>>>> 
>>>> A little more info from your side will help us to give your the right
>>>> pointers.
>>>> 
>>>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Dear Mahout Team,
>>>>> 
>>>>> I am a student new to machine learning and i am trying to build a user
>>>>> based recommender using mahout.
>>>>> 
>>>>> My dataset is a csv file as an input but it has many fields as text
> and
>>> i
>>>>> understand mahout needs numeric values.
>>>>> 
>>>>> Can you give me a headstart as to where i should start and what kind
> of
>>>>> tools i need to parse the text colummns,
>>>>> 
>>>>> Also an idea on which classifiers or clustering methods i should use
>>>> would
>>>>> be highly appreciated.
>>>>> 
>>>>> 
>>>>> Best Regards;
>>>>> Yash Patel
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

Can you give me some more details on the Hadoop mapreduce item-based
cooccurrence recommender.


Best Regards,
Yash Patel

On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I built this app with it: https://guide.finderbots.com
>
> The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
> out of the job it is csv text—therefore language and architecture neutral.
> I load the data from spark-itemsimilarity into MongoDB using java. Solr is
> set up for full-text indexing and queries using data from MongoDB. The
> queries are made to Solr through REST from Ruby UX code. You can replace
> any component in this stack with whatever you wish and use whatever
> language you are comfortable with.
>
> Alternatively you could modify the UI of Solr or Elasticsearch—both are in
> Java.
>
> If you use any of the other Mahout recommenders they create all recs for
> all known users so you’ll still need to build a way to serve those results.
> People often use DBs for this and integrate with their web app framework.
>
> On Nov 28, 2014, at 10:03 AM, Yash Patel <ya...@gmail.com> wrote:
>
> I looked up spark row similarity but i am not sure if it will suit my needs
> as i want to build my recommender as a java application possibly with an
> interface.
>
>
> On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > Some references:
> >
> > small free book here, which talks about the general idea:
> > https://www.mapr.com/practical-machine-learning
> > preso, which talks about mixing actions or other indicators:
> >
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> > two blog posts:
> >
> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
> >
> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
> > mahout docs:
> > http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
> >
> > Build Mahout from this source: https://github.com/apache/mahout This
> will
> > run stand-alone on a dev machine, then if your data is too big for a
> single
> > machine you can run it on a Spark + Hadoop cluster. The data this creates
> > can be put into a DB or indexed directly by a search engine (Solr or
> > Elasticsearch). Choose the search engine you want then queries of a
> user’s
> > item id history will go there--results will be an ordered list of item
> ids
> > to recommend.
> >
> > The core piece is the command line job: “mahout spark-itemsimilarity”,
> > which can parse csv data. The options specify what columns are used for
> ids.
> >
> > Start out simple by looking only at user and item IDs. Then you can add
> > other cross-cooccurrence indicators for multiple actions later pretty
> > easily.
> >
> >
> > On Nov 28, 2014, at 12:14 AM, Yash Patel <ya...@gmail.com>
> wrote:
> >
> > The mahout + search engine recommender seems what would be best for the
> > data i have.
> >
> > Kindly get back to me at your earliest convenience.
> >
> >
> >
> > Best Regards,
> > Yash Patel
> >
> > On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
> >
> >> Mahout has several recommenders so no need to create one from
> components.
> >> They all make use of the similarity of preferences between users—that’s
> > why
> >> they are in the category of collaborative filtering.
> >>
> >> Primary Mahout Recommenders:
> >> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
> recs
> >> for all users. Uses “Mahout IDs"
> >> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
> >> the data. Sometimes better for small data sets than #1. Uses “Mahout
> IDs"
> >> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
> >> with multiple actions (multi-modal), works for new users that have some
> >> history, has a scalable server (from the search engine) but is more
> >> difficult to integrate than #1 or #2. Uses your own ids and reads csv
> > files.
> >>
> >> The rest of the data seems to apply either to the user or the item and
> so
> >> would be used in different ways. #1 an #2 can only use user id and item
> > id
> >> but some post recommendation weighting or filtering can be applied. #3
> > can
> >> use multiple attributes in different ways. For instance if category is
> an
> >> item attribute you can create two actions, user-pref-for-an-item, and
> >> user-pref-for-a-category. Assuming you want to recommend an item (not
> >> category) you can create a cross-ccoccurrence indicator for the second
> >> action and use the data to make your item recs better. #3 is the only
> >> methods that supports this.
> >>
> >> Pick a recommender and we can help more with data prep.
> >>
> >>
> >> On Nov 26, 2014, at 1:34 PM, Yash Patel <ya...@gmail.com>
> wrote:
> >>
> >> Hello everyone,
> >>
> >> wow i am quite happy to see so many inputs from people.
> >>
> >> I apologize for not providing more details.
> >>
> >> Although this is not my complete dataset the fields i have chosen to use
> >> are:
> >>
> >> customer id - numeric
> >> item id - text
> >> postal code - text
> >> item category ´- text
> >> potential growth - text
> >> territory - text
> >>
> >>
> >> Basically i was thinking of finding similar users and recommending them
> >> items that users like them have bought but they haven't.
> >>
> >> Although i would very much like to hear your opinions as i am not so
> >> familiar with clustering,classifiers etc.
> >>
> >> I found that mahout takes sequence files converted into vectors but i
> >> couldn't understand how would i do it on my data specifically and more
> >> importantly make a recommender system out of it.
> >>
> >> Also i am wondering how to combine the importance of a specific customer
> >> through the potential growth attribute.
> >>
> >>
> >>
> >>
> >>
> >>
> >> Best Regards,
> >> Yash Patel
> >>
> >> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pa...@occamsmachete.com>
> > wrote:
> >>
> >>> All very good points but note that spark-itemsimilarity may take the
> >> input
> >>> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
> >>>
> >>> On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com>
> >> wrote:
> >>>
> >>> kindly elaborate... your requirements... your dataset fields ...and
> what
> >>> you want to recommend to an user... Usually a set of item is
> recommended
> >> to
> >>> an user. In your case what are your items ?
> >>>
> >>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
> >> not
> >>> in this format which will let you use directly the algorithms in
> Mahout.
> >>>
> >>> A little more info from your side will help us to give your the right
> >>> pointers.
> >>>
> >>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com>
> >>> wrote:
> >>>
> >>>> Dear Mahout Team,
> >>>>
> >>>> I am a student new to machine learning and i am trying to build a user
> >>>> based recommender using mahout.
> >>>>
> >>>> My dataset is a csv file as an input but it has many fields as text
> and
> >> i
> >>>> understand mahout needs numeric values.
> >>>>
> >>>> Can you give me a headstart as to where i should start and what kind
> of
> >>>> tools i need to parse the text colummns,
> >>>>
> >>>> Also an idea on which classifiers or clustering methods i should use
> >>> would
> >>>> be highly appreciated.
> >>>>
> >>>>
> >>>> Best Regards;
> >>>> Yash Patel
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

I built this app with it: https://guide.finderbots.com

The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes out of the job it is csv text—therefore language and architecture neutral.  I load the data from spark-itemsimilarity into MongoDB using java. Solr is set up for full-text indexing and queries using data from MongoDB. The queries are made to Solr through REST from Ruby UX code. You can replace any component in this stack with whatever you wish and use whatever language you are comfortable with.

Alternatively you could modify the UI of Solr or Elasticsearch—both are in Java.

If you use any of the other Mahout recommenders they create all recs for all known users so you’ll still need to build a way to serve those results. People often use DBs for this and integrate with their web app framework.

On Nov 28, 2014, at 10:03 AM, Yash Patel <ya...@gmail.com> wrote:

I looked up spark row similarity but i am not sure if it will suit my needs
as i want to build my recommender as a java application possibly with an
interface.


On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Some references:
> 
> small free book here, which talks about the general idea:
> https://www.mapr.com/practical-machine-learning
> preso, which talks about mixing actions or other indicators:
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> two blog posts:
> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
> mahout docs:
> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
> 
> Build Mahout from this source: https://github.com/apache/mahout This will
> run stand-alone on a dev machine, then if your data is too big for a single
> machine you can run it on a Spark + Hadoop cluster. The data this creates
> can be put into a DB or indexed directly by a search engine (Solr or
> Elasticsearch). Choose the search engine you want then queries of a user’s
> item id history will go there--results will be an ordered list of item ids
> to recommend.
> 
> The core piece is the command line job: “mahout spark-itemsimilarity”,
> which can parse csv data. The options specify what columns are used for ids.
> 
> Start out simple by looking only at user and item IDs. Then you can add
> other cross-cooccurrence indicators for multiple actions later pretty
> easily.
> 
> 
> On Nov 28, 2014, at 12:14 AM, Yash Patel <ya...@gmail.com> wrote:
> 
> The mahout + search engine recommender seems what would be best for the
> data i have.
> 
> Kindly get back to me at your earliest convenience.
> 
> 
> 
> Best Regards,
> Yash Patel
> 
> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> Mahout has several recommenders so no need to create one from components.
>> They all make use of the similarity of preferences between users—that’s
> why
>> they are in the category of collaborative filtering.
>> 
>> Primary Mahout Recommenders:
>> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs
>> for all users. Uses “Mahout IDs"
>> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
>> the data. Sometimes better for small data sets than #1. Uses “Mahout IDs"
>> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
>> with multiple actions (multi-modal), works for new users that have some
>> history, has a scalable server (from the search engine) but is more
>> difficult to integrate than #1 or #2. Uses your own ids and reads csv
> files.
>> 
>> The rest of the data seems to apply either to the user or the item and so
>> would be used in different ways. #1 an #2 can only use user id and item
> id
>> but some post recommendation weighting or filtering can be applied. #3
> can
>> use multiple attributes in different ways. For instance if category is an
>> item attribute you can create two actions, user-pref-for-an-item, and
>> user-pref-for-a-category. Assuming you want to recommend an item (not
>> category) you can create a cross-ccoccurrence indicator for the second
>> action and use the data to make your item recs better. #3 is the only
>> methods that supports this.
>> 
>> Pick a recommender and we can help more with data prep.
>> 
>> 
>> On Nov 26, 2014, at 1:34 PM, Yash Patel <ya...@gmail.com> wrote:
>> 
>> Hello everyone,
>> 
>> wow i am quite happy to see so many inputs from people.
>> 
>> I apologize for not providing more details.
>> 
>> Although this is not my complete dataset the fields i have chosen to use
>> are:
>> 
>> customer id - numeric
>> item id - text
>> postal code - text
>> item category ´- text
>> potential growth - text
>> territory - text
>> 
>> 
>> Basically i was thinking of finding similar users and recommending them
>> items that users like them have bought but they haven't.
>> 
>> Although i would very much like to hear your opinions as i am not so
>> familiar with clustering,classifiers etc.
>> 
>> I found that mahout takes sequence files converted into vectors but i
>> couldn't understand how would i do it on my data specifically and more
>> importantly make a recommender system out of it.
>> 
>> Also i am wondering how to combine the importance of a specific customer
>> through the potential growth attribute.
>> 
>> 
>> 
>> 
>> 
>> 
>> Best Regards,
>> Yash Patel
>> 
>> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>> 
>>> All very good points but note that spark-itemsimilarity may take the
>> input
>>> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
>>> 
>>> On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com>
>> wrote:
>>> 
>>> kindly elaborate... your requirements... your dataset fields ...and what
>>> you want to recommend to an user... Usually a set of item is recommended
>> to
>>> an user. In your case what are your items ?
>>> 
>>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
>> not
>>> in this format which will let you use directly the algorithms in Mahout.
>>> 
>>> A little more info from your side will help us to give your the right
>>> pointers.
>>> 
>>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com>
>>> wrote:
>>> 
>>>> Dear Mahout Team,
>>>> 
>>>> I am a student new to machine learning and i am trying to build a user
>>>> based recommender using mahout.
>>>> 
>>>> My dataset is a csv file as an input but it has many fields as text and
>> i
>>>> understand mahout needs numeric values.
>>>> 
>>>> Can you give me a headstart as to where i should start and what kind of
>>>> tools i need to parse the text colummns,
>>>> 
>>>> Also an idea on which classifiers or clustering methods i should use
>>> would
>>>> be highly appreciated.
>>>> 
>>>> 
>>>> Best Regards;
>>>> Yash Patel
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

I looked up spark row similarity but i am not sure if it will suit my needs
as i want to build my recommender as a java application possibly with an
interface.


On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Some references:
>
> small free book here, which talks about the general idea:
> https://www.mapr.com/practical-machine-learning
> preso, which talks about mixing actions or other indicators:
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> two blog posts:
> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
> mahout docs:
> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
>
> Build Mahout from this source: https://github.com/apache/mahout This will
> run stand-alone on a dev machine, then if your data is too big for a single
> machine you can run it on a Spark + Hadoop cluster. The data this creates
> can be put into a DB or indexed directly by a search engine (Solr or
> Elasticsearch). Choose the search engine you want then queries of a user’s
> item id history will go there--results will be an ordered list of item ids
> to recommend.
>
> The core piece is the command line job: “mahout spark-itemsimilarity”,
> which can parse csv data. The options specify what columns are used for ids.
>
> Start out simple by looking only at user and item IDs. Then you can add
> other cross-cooccurrence indicators for multiple actions later pretty
> easily.
>
>
> On Nov 28, 2014, at 12:14 AM, Yash Patel <ya...@gmail.com> wrote:
>
> The mahout + search engine recommender seems what would be best for the
> data i have.
>
> Kindly get back to me at your earliest convenience.
>
>
>
> Best Regards,
> Yash Patel
>
> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > Mahout has several recommenders so no need to create one from components.
> > They all make use of the similarity of preferences between users—that’s
> why
> > they are in the category of collaborative filtering.
> >
> > Primary Mahout Recommenders:
> > 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs
> > for all users. Uses “Mahout IDs"
> > 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
> > the data. Sometimes better for small data sets than #1. Uses “Mahout IDs"
> > 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
> > with multiple actions (multi-modal), works for new users that have some
> > history, has a scalable server (from the search engine) but is more
> > difficult to integrate than #1 or #2. Uses your own ids and reads csv
> files.
> >
> > The rest of the data seems to apply either to the user or the item and so
> > would be used in different ways. #1 an #2 can only use user id and item
> id
> > but some post recommendation weighting or filtering can be applied. #3
> can
> > use multiple attributes in different ways. For instance if category is an
> > item attribute you can create two actions, user-pref-for-an-item, and
> > user-pref-for-a-category. Assuming you want to recommend an item (not
> > category) you can create a cross-ccoccurrence indicator for the second
> > action and use the data to make your item recs better. #3 is the only
> > methods that supports this.
> >
> > Pick a recommender and we can help more with data prep.
> >
> >
> > On Nov 26, 2014, at 1:34 PM, Yash Patel <ya...@gmail.com> wrote:
> >
> > Hello everyone,
> >
> > wow i am quite happy to see so many inputs from people.
> >
> > I apologize for not providing more details.
> >
> > Although this is not my complete dataset the fields i have chosen to use
> > are:
> >
> > customer id - numeric
> > item id - text
> > postal code - text
> > item category ´- text
> > potential growth - text
> > territory - text
> >
> >
> > Basically i was thinking of finding similar users and recommending them
> > items that users like them have bought but they haven't.
> >
> > Although i would very much like to hear your opinions as i am not so
> > familiar with clustering,classifiers etc.
> >
> > I found that mahout takes sequence files converted into vectors but i
> > couldn't understand how would i do it on my data specifically and more
> > importantly make a recommender system out of it.
> >
> > Also i am wondering how to combine the importance of a specific customer
> > through the potential growth attribute.
> >
> >
> >
> >
> >
> >
> > Best Regards,
> > Yash Patel
> >
> > On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
> >
> >> All very good points but note that spark-itemsimilarity may take the
> > input
> >> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
> >>
> >> On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com>
> > wrote:
> >>
> >> kindly elaborate... your requirements... your dataset fields ...and what
> >> you want to recommend to an user... Usually a set of item is recommended
> > to
> >> an user. In your case what are your items ?
> >>
> >> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
> > not
> >> in this format which will let you use directly the algorithms in Mahout.
> >>
> >> A little more info from your side will help us to give your the right
> >> pointers.
> >>
> >> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com>
> >> wrote:
> >>
> >>> Dear Mahout Team,
> >>>
> >>> I am a student new to machine learning and i am trying to build a user
> >>> based recommender using mahout.
> >>>
> >>> My dataset is a csv file as an input but it has many fields as text and
> > i
> >>> understand mahout needs numeric values.
> >>>
> >>> Can you give me a headstart as to where i should start and what kind of
> >>> tools i need to parse the text colummns,
> >>>
> >>> Also an idea on which classifiers or clustering methods i should use
> >> would
> >>> be highly appreciated.
> >>>
> >>>
> >>> Best Regards;
> >>> Yash Patel
> >>>
> >>
> >>
> >
> >
>
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Some references:

small free book here, which talks about the general idea: https://www.mapr.com/practical-machine-learning
preso, which talks about mixing actions or other indicators: http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ 
two blog posts: http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/ http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
mahout docs: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html

Build Mahout from this source: https://github.com/apache/mahout This will run stand-alone on a dev machine, then if your data is too big for a single machine you can run it on a Spark + Hadoop cluster. The data this creates can be put into a DB or indexed directly by a search engine (Solr or Elasticsearch). Choose the search engine you want then queries of a user’s item id history will go there--results will be an ordered list of item ids to recommend. 

The core piece is the command line job: “mahout spark-itemsimilarity”, which can parse csv data. The options specify what columns are used for ids.

Start out simple by looking only at user and item IDs. Then you can add other cross-cooccurrence indicators for multiple actions later pretty easily.


On Nov 28, 2014, at 12:14 AM, Yash Patel <ya...@gmail.com> wrote:

The mahout + search engine recommender seems what would be best for the
data i have.

Kindly get back to me at your earliest convenience.



Best Regards,
Yash Patel

On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Mahout has several recommenders so no need to create one from components.
> They all make use of the similarity of preferences between users—that’s why
> they are in the category of collaborative filtering.
> 
> Primary Mahout Recommenders:
> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs
> for all users. Uses “Mahout IDs"
> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
> the data. Sometimes better for small data sets than #1. Uses “Mahout IDs"
> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
> with multiple actions (multi-modal), works for new users that have some
> history, has a scalable server (from the search engine) but is more
> difficult to integrate than #1 or #2. Uses your own ids and reads csv files.
> 
> The rest of the data seems to apply either to the user or the item and so
> would be used in different ways. #1 an #2 can only use user id and item id
> but some post recommendation weighting or filtering can be applied. #3 can
> use multiple attributes in different ways. For instance if category is an
> item attribute you can create two actions, user-pref-for-an-item, and
> user-pref-for-a-category. Assuming you want to recommend an item (not
> category) you can create a cross-ccoccurrence indicator for the second
> action and use the data to make your item recs better. #3 is the only
> methods that supports this.
> 
> Pick a recommender and we can help more with data prep.
> 
> 
> On Nov 26, 2014, at 1:34 PM, Yash Patel <ya...@gmail.com> wrote:
> 
> Hello everyone,
> 
> wow i am quite happy to see so many inputs from people.
> 
> I apologize for not providing more details.
> 
> Although this is not my complete dataset the fields i have chosen to use
> are:
> 
> customer id - numeric
> item id - text
> postal code - text
> item category ´- text
> potential growth - text
> territory - text
> 
> 
> Basically i was thinking of finding similar users and recommending them
> items that users like them have bought but they haven't.
> 
> Although i would very much like to hear your opinions as i am not so
> familiar with clustering,classifiers etc.
> 
> I found that mahout takes sequence files converted into vectors but i
> couldn't understand how would i do it on my data specifically and more
> importantly make a recommender system out of it.
> 
> Also i am wondering how to combine the importance of a specific customer
> through the potential growth attribute.
> 
> 
> 
> 
> 
> 
> Best Regards,
> Yash Patel
> 
> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> All very good points but note that spark-itemsimilarity may take the
> input
>> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
>> 
>> On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com>
> wrote:
>> 
>> kindly elaborate... your requirements... your dataset fields ...and what
>> you want to recommend to an user... Usually a set of item is recommended
> to
>> an user. In your case what are your items ?
>> 
>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
> not
>> in this format which will let you use directly the algorithms in Mahout.
>> 
>> A little more info from your side will help us to give your the right
>> pointers.
>> 
>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com>
>> wrote:
>> 
>>> Dear Mahout Team,
>>> 
>>> I am a student new to machine learning and i am trying to build a user
>>> based recommender using mahout.
>>> 
>>> My dataset is a csv file as an input but it has many fields as text and
> i
>>> understand mahout needs numeric values.
>>> 
>>> Can you give me a headstart as to where i should start and what kind of
>>> tools i need to parse the text colummns,
>>> 
>>> Also an idea on which classifiers or clustering methods i should use
>> would
>>> be highly appreciated.
>>> 
>>> 
>>> Best Regards;
>>> Yash Patel
>>> 
>> 
>> 
> 
>

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

The mahout + search engine recommender seems what would be best for the
data i have.

Kindly get back to me at your earliest convenience.



Best Regards,
Yash Patel

On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Mahout has several recommenders so no need to create one from components.
> They all make use of the similarity of preferences between users—that’s why
> they are in the category of collaborative filtering.
>
> Primary Mahout Recommenders:
> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs
> for all users. Uses “Mahout IDs"
> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
> the data. Sometimes better for small data sets than #1. Uses “Mahout IDs"
> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
> with multiple actions (multi-modal), works for new users that have some
> history, has a scalable server (from the search engine) but is more
> difficult to integrate than #1 or #2. Uses your own ids and reads csv files.
>
> The rest of the data seems to apply either to the user or the item and so
> would be used in different ways. #1 an #2 can only use user id and item id
> but some post recommendation weighting or filtering can be applied. #3 can
> use multiple attributes in different ways. For instance if category is an
> item attribute you can create two actions, user-pref-for-an-item, and
> user-pref-for-a-category. Assuming you want to recommend an item (not
> category) you can create a cross-ccoccurrence indicator for the second
> action and use the data to make your item recs better. #3 is the only
> methods that supports this.
>
> Pick a recommender and we can help more with data prep.
>
>
> On Nov 26, 2014, at 1:34 PM, Yash Patel <ya...@gmail.com> wrote:
>
> Hello everyone,
>
> wow i am quite happy to see so many inputs from people.
>
> I apologize for not providing more details.
>
> Although this is not my complete dataset the fields i have chosen to use
> are:
>
> customer id - numeric
> item id - text
> postal code - text
> item category ´- text
> potential growth - text
> territory - text
>
>
> Basically i was thinking of finding similar users and recommending them
> items that users like them have bought but they haven't.
>
> Although i would very much like to hear your opinions as i am not so
> familiar with clustering,classifiers etc.
>
> I found that mahout takes sequence files converted into vectors but i
> couldn't understand how would i do it on my data specifically and more
> importantly make a recommender system out of it.
>
> Also i am wondering how to combine the importance of a specific customer
> through the potential growth attribute.
>
>
>
>
>
>
> Best Regards,
> Yash Patel
>
> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > All very good points but note that spark-itemsimilarity may take the
> input
> > directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
> >
> > On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com>
> wrote:
> >
> > kindly elaborate... your requirements... your dataset fields ...and what
> > you want to recommend to an user... Usually a set of item is recommended
> to
> > an user. In your case what are your items ?
> >
> > The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
> not
> > in this format which will let you use directly the algorithms in Mahout.
> >
> > A little more info from your side will help us to give your the right
> > pointers.
> >
> > On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com>
> > wrote:
> >
> >> Dear Mahout Team,
> >>
> >> I am a student new to machine learning and i am trying to build a user
> >> based recommender using mahout.
> >>
> >> My dataset is a csv file as an input but it has many fields as text and
> i
> >> understand mahout needs numeric values.
> >>
> >> Can you give me a headstart as to where i should start and what kind of
> >> tools i need to parse the text colummns,
> >>
> >> Also an idea on which classifiers or clustering methods i should use
> > would
> >> be highly appreciated.
> >>
> >>
> >> Best Regards;
> >> Yash Patel
> >>
> >
> >
>
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Mahout has several recommenders so no need to create one from components. They all make use of the similarity of preferences between users—that’s why they are in the category of collaborative filtering.

Primary Mahout Recommenders:
1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs for all users. Uses “Mahout IDs"
2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in the data. Sometimes better for small data sets than #1. Uses “Mahout IDs"
3) Mahout + search engine: cooccurrence type. Extremely flexible, works with multiple actions (multi-modal), works for new users that have some history, has a scalable server (from the search engine) but is more difficult to integrate than #1 or #2. Uses your own ids and reads csv files.

The rest of the data seems to apply either to the user or the item and so would be used in different ways. #1 an #2 can only use user id and item id but some post recommendation weighting or filtering can be applied. #3 can use multiple attributes in different ways. For instance if category is an item attribute you can create two actions, user-pref-for-an-item, and user-pref-for-a-category. Assuming you want to recommend an item (not category) you can create a cross-ccoccurrence indicator for the second action and use the data to make your item recs better. #3 is the only methods that supports this.

Pick a recommender and we can help more with data prep.

On Nov 26, 2014, at 1:34 PM, Yash Patel <ya...@gmail.com> wrote:

Hello everyone,

wow i am quite happy to see so many inputs from people.

I apologize for not providing more details.

Although this is not my complete dataset the fields i have chosen to use
are:

customer id - numeric
item id - text
postal code - text
item category ´- text
potential growth - text
territory - text

Basically i was thinking of finding similar users and recommending them
items that users like them have bought but they haven't.

Although i would very much like to hear your opinions as i am not so
familiar with clustering,classifiers etc.

I found that mahout takes sequence files converted into vectors but i
couldn't understand how would i do it on my data specifically and more
importantly make a recommender system out of it.

Also i am wondering how to combine the importance of a specific customer
through the potential growth attribute.

Best Regards,
Yash Patel

On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> All very good points but note that spark-itemsimilarity may take the input
> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
> 
> On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com> wrote:
> 
> kindly elaborate... your requirements... your dataset fields ...and what
> you want to recommend to an user... Usually a set of item is recommended to
> an user. In your case what are your items ?
> 
> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is not
> in this format which will let you use directly the algorithms in Mahout.
> 
> A little more info from your side will help us to give your the right
> pointers.
> 
> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com>
> wrote:
> 
>> Dear Mahout Team,
>> 
>> I am a student new to machine learning and i am trying to build a user
>> based recommender using mahout.
>> 
>> My dataset is a csv file as an input but it has many fields as text and i
>> understand mahout needs numeric values.
>> 
>> Can you give me a headstart as to where i should start and what kind of
>> tools i need to parse the text colummns,
>> 
>> Also an idea on which classifiers or clustering methods i should use
> would
>> be highly appreciated.
>> 
>> 
>> Best Regards;
>> Yash Patel
>> 
> 
>

Re: User based recommender

Posted by Yash Patel <ya...@gmail.com>.

Hello everyone,

wow i am quite happy to see so many inputs from people.

I apologize for not providing more details.

Although this is not my complete dataset the fields i have chosen to use
are:

customer id - numeric
item id - text
postal code - text
item category ´- text
potential growth - text
territory - text

Basically i was thinking of finding similar users and recommending them
items that users like them have bought but they haven't.

Although i would very much like to hear your opinions as i am not so
familiar with clustering,classifiers etc.

I found that mahout takes sequence files converted into vectors but i
couldn't understand how would i do it on my data specifically and more
importantly make a recommender system out of it.

Also i am wondering how to combine the importance of a specific customer
through the potential growth attribute.

Best Regards,
Yash Patel

On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> All very good points but note that spark-itemsimilarity may take the input
> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
>
> On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com> wrote:
>
> kindly elaborate... your requirements... your dataset fields ...and what
> you want to recommend to an user... Usually a set of item is recommended to
> an user. In your case what are your items ?
>
> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is not
> in this format which will let you use directly the algorithms in Mahout.
>
> A little more info from your side will help us to give your the right
> pointers.
>
> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com>
> wrote:
>
> > Dear Mahout Team,
> >
> > I am a student new to machine learning and i am trying to build a user
> > based recommender using mahout.
> >
> > My dataset is a csv file as an input but it has many fields as text and i
> > understand mahout needs numeric values.
> >
> > Can you give me a headstart as to where i should start and what kind of
> > tools i need to parse the text colummns,
> >
> > Also an idea on which classifiers or clustering methods i should use
> would
> > be highly appreciated.
> >
> >
> > Best Regards;
> > Yash Patel
> >
>
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

All very good points but note that spark-itemsimilarity may take the input directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>

On Nov 26, 2014, at 11:43 AM, parnab kumar <pa...@gmail.com> wrote:

kindly elaborate... your requirements... your dataset fields ...and what
you want to recommend to an user... Usually a set of item is recommended to
an user. In your case what are your items ?

The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is not
in this format which will let you use directly the algorithms in Mahout.

A little more info from your side will help us to give your the right
pointers.

On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com> wrote:

> Dear Mahout Team,
> 
> I am a student new to machine learning and i am trying to build a user
> based recommender using mahout.
> 
> My dataset is a csv file as an input but it has many fields as text and i
> understand mahout needs numeric values.
> 
> Can you give me a headstart as to where i should start and what kind of
> tools i need to parse the text colummns,
> 
> Also an idea on which classifiers or clustering methods i should use would
> be highly appreciated.
> 
> 
> Best Regards;
> Yash Patel
>

Re: User based recommender

Posted by parnab kumar <pa...@gmail.com>.

kindly elaborate... your requirements... your dataset fields ...and what
you want to recommend to an user... Usually a set of item is recommended to
an user. In your case what are your items ?

The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is not
in this format which will let you use directly the algorithms in Mahout.

A little more info from your side will help us to give your the right
pointers.

On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <ya...@gmail.com> wrote:

> Dear Mahout Team,
>
> I am a student new to machine learning and i am trying to build a user
> based recommender using mahout.
>
> My dataset is a csv file as an input but it has many fields as text and i
> understand mahout needs numeric values.
>
> Can you give me a headstart as to where i should start and what kind of
> tools i need to parse the text colummns,
>
> Also an idea on which classifiers or clustering methods i should use would
> be highly appreciated.
>
>
> Best Regards;
> Yash Patel
>

Re: User based recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Hi Yash,

What exactly do you mean by “user-based” recommender? What does your data look like? What are the columns in the CSV? For collaborative filtering you will need a user-ID and an item-ID for each preference the user has expressed.

Mahout has several recommenders so building one should be easy. Is it ok to use an existing one?

For all the recommenders you need a CSV of:
user-ID,item-ID,preference-strength(optional)

For the older in-memory or hadoop mapreduce recommenders the IDs must be ordinal non-negative ints that correspond to row and column numbers for the input matrix that will be created from all input elements. The first time you see the user-ID give it a Mahout ID of 0, the next unique user-ID will get 1, and so on. The same for item-IDs

The newest technique is to use Mahout v1 built from source with Spark and the spark-itemsimilarity job, which will take your application specific ID strings and use them directly. Since this job takes CSVs as input you may be able to use your existing input file(s). The job creates a text file that can be indexed with a search engine to produce recommendations via queries. The query is a list of user history (a list of item-IDs). You get back an ordered list of item-IDs to recommend.

Docs here: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html

On Nov 26, 2014, at 11:16 AM, Yash Patel <ya...@gmail.com> wrote:

Dear Mahout Team,

I am a student new to machine learning and i am trying to build a user
based recommender using mahout.

My dataset is a csv file as an input but it has many fields as text and i
understand mahout needs numeric values.

Can you give me a headstart as to where i should start and what kind of
tools i need to parse the text colummns,

Also an idea on which classifiers or clustering methods i should use would
be highly appreciated.

Best Regards;
Yash Patel