You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Jason Lee <wu...@gmail.com> on 2013/07/24 06:28:08 UTC

Implement LinkedIn's PYMK(People You May Know) feature use Mahout, any suggestions?

Hi all,

Currently i am working on recommendation system in a SNS site. There are
15M+ registered members in our site. We already have a PYMK
implementation(not use mahout or any machine learning algorithms libs), but
the accuracy of recommend results produced by current implementation is not
as good as we expected, so i'm looking for a better way to implement this
feature.

Here are some rules should be considered when recommend "People You May
Know" to current member: (any supplementaries?)
Contacts list imported by current member;
Same company:
    overlap of employed date range between current member and recommended
members;
    size of company;
    function of current member and recommended members;
Same login IP
Same school
Mutual Friends


As far as i know, Mahout is focus on CF(Collaborative filtering), but PYMK
is more likely a content-based recommendation, because the informations
that hold in member's profile is base of PYMK processing.

Re: Implement LinkedIn's PYMK(People You May Know) feature use Mahout, any suggestions?

Posted by Kevin Schiesser <ke...@xoom.com>.
Hi,

I am also very new to Mahout.

My understanding is that Mahout aims to implement 10 machine learning
algorithms noted this paper
http://cs.stanford.edu/people/ang/papers/nips06-mapreducemulticore.pdf

Related to classification, try running the classify-20newsgroups example.
It's a good way to be sure the system is working, and to get familiar with
the input/output of Mahout.

-Kevin

On 7/24/13 09:12 , "Sebastian Schelter" <ss...@apache.org> wrote:

Jason,

You should also search the literature for "link prediction", thats the
academic term for the problem you describe.

This paper might be a good starting point:

"The Link Prediction Problem for Social Networks"

http://www.cs.cornell.edu/home/kleinber/link-pred.pdf?


2013/7/24 Ted Dunning <te...@gmail.com>

> I don't see the contact list of the potential connection.  Overlap of
> connection lists should be an extremely strong signal.
>
> You are correct that this tends to implemented be a classification
>problem.
>  The target variable is a binary variable that indicates whether the
>person
> knows or does not know the potential connection. Predictor variables
> include what you have described as well as many variants of the same.
>
>
>
> On Tue, Jul 23, 2013 at 9:28 PM, Jason Lee <wu...@gmail.com> wrote:
>
> > Hi all,
> >
> > Currently i am working on recommendation system in a SNS site. There
>are
> > 15M+ registered members in our site. We already have a PYMK
> > implementation(not use mahout or any machine learning algorithms libs),
> but
> > the accuracy of recommend results produced by current implementation is
> not
> > as good as we expected, so i'm looking for a better way to implement
>this
> > feature.
> >
> > Here are some rules should be considered when recommend "People You May
> > Know" to current member: (any supplementaries?)
> > Contacts list imported by current member;
> > Same company:
> >     overlap of employed date range between current member and
>recommended
> > members;
> >     size of company;
> >     function of current member and recommended members;
> > Same login IP
> > Same school
> > Mutual Friends
> >
> >
> > As far as i know, Mahout is focus on CF(Collaborative filtering), but
> PYMK
> > is more likely a content-based recommendation, because the informations
> > that hold in member's profile is base of PYMK processing.
> >
>

---------------------------------------------------------------------------------
The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc.  Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited.   If you received this email in error, please contact the sender and delete the material from your files.



Re: Implement LinkedIn's PYMK(People You May Know) feature use Mahout, any suggestions?

Posted by Sebastian Schelter <ss...@apache.org>.
Jason,

You should also search the literature for "link prediction", thats the
academic term for the problem you describe.

This paper might be a good starting point:

"The Link Prediction Problem for Social Networks"

http://www.cs.cornell.edu/home/kleinber/link-pred.pdf‎


2013/7/24 Ted Dunning <te...@gmail.com>

> I don't see the contact list of the potential connection.  Overlap of
> connection lists should be an extremely strong signal.
>
> You are correct that this tends to implemented be a classification problem.
>  The target variable is a binary variable that indicates whether the person
> knows or does not know the potential connection. Predictor variables
> include what you have described as well as many variants of the same.
>
>
>
> On Tue, Jul 23, 2013 at 9:28 PM, Jason Lee <wu...@gmail.com> wrote:
>
> > Hi all,
> >
> > Currently i am working on recommendation system in a SNS site. There are
> > 15M+ registered members in our site. We already have a PYMK
> > implementation(not use mahout or any machine learning algorithms libs),
> but
> > the accuracy of recommend results produced by current implementation is
> not
> > as good as we expected, so i'm looking for a better way to implement this
> > feature.
> >
> > Here are some rules should be considered when recommend "People You May
> > Know" to current member: (any supplementaries?)
> > Contacts list imported by current member;
> > Same company:
> >     overlap of employed date range between current member and recommended
> > members;
> >     size of company;
> >     function of current member and recommended members;
> > Same login IP
> > Same school
> > Mutual Friends
> >
> >
> > As far as i know, Mahout is focus on CF(Collaborative filtering), but
> PYMK
> > is more likely a content-based recommendation, because the informations
> > that hold in member's profile is base of PYMK processing.
> >
>

Re: Implement LinkedIn's PYMK(People You May Know) feature use Mahout, any suggestions?

Posted by Ted Dunning <te...@gmail.com>.
I don't see the contact list of the potential connection.  Overlap of
connection lists should be an extremely strong signal.

You are correct that this tends to implemented be a classification problem.
 The target variable is a binary variable that indicates whether the person
knows or does not know the potential connection. Predictor variables
include what you have described as well as many variants of the same.



On Tue, Jul 23, 2013 at 9:28 PM, Jason Lee <wu...@gmail.com> wrote:

> Hi all,
>
> Currently i am working on recommendation system in a SNS site. There are
> 15M+ registered members in our site. We already have a PYMK
> implementation(not use mahout or any machine learning algorithms libs), but
> the accuracy of recommend results produced by current implementation is not
> as good as we expected, so i'm looking for a better way to implement this
> feature.
>
> Here are some rules should be considered when recommend "People You May
> Know" to current member: (any supplementaries?)
> Contacts list imported by current member;
> Same company:
>     overlap of employed date range between current member and recommended
> members;
>     size of company;
>     function of current member and recommended members;
> Same login IP
> Same school
> Mutual Friends
>
>
> As far as i know, Mahout is focus on CF(Collaborative filtering), but PYMK
> is more likely a content-based recommendation, because the informations
> that hold in member's profile is base of PYMK processing.
>