You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Gökhan Çapan <gk...@gmail.com> on 2011/02/01 00:26:35 UTC

Recommeding on Dynamic Content

Hi,

I've made a search, sorry in case this is a double post.
Also, this question may not be directly related to Mahout.

Within a domain which is enitrely user generated and has a very big item
churn (lots of new items coming, while some others leaving the system), what
do you recommend to produce accurate recommendations using Mahout (Not just
Taste)?

I mean, as a concrete example, in the eBay domain, not Amazon's.

Currently I am creating item clusters using LSH with MinHash (I am not sure
if it is in Mahout, I can contribute if it is not), and produce
recommendations using these item clusters (profiles). When a new item
arrives, I find its nearest profile, and recommend the item where its
belonging profile is recommended to. Do you find this approach good enough?

If you have a theoretical idea, could you please point me to some related
papers?

(As an MSc student, I can implement this as a Google Summer of Code project,
with your mentoring.)

Thanks in advance

-- 
Gokhan

Re: Recommeding on Dynamic Content

Posted by Ted Dunning <te...@gmail.com>.

Here is a pointer to the Menon and Elkan paper:
http://arxiv.org/abs/1006.2156

 <http://arxiv.org/abs/1006.2156>Also, see chapter 17 of Mahout in Action
for a description of how you can use the SGD classifiers already in Mahout
for this kind of work.

You lose the very cool recommendations framework that Mahout has, but you
gain the ability to recommend in high churn situations.

On Tue, Feb 1, 2011 at 1:52 AM, Sean Owen <sr...@gmail.com> wrote:

> One approach is to use user-user similarities. Those build up over time
> based on historical data, but can be used to produce recommendations for
> brand-new items going forward.
>
> It still has a cold-start problem; until anyone connects to one of those
> new
> items, it can't be recommended.
>
> Another approach is to use the item's characteristics to determine some
> notion of similarity, in the absence of clicks. That's what you're doing
> and
> it's a great approach.
>
> You can also consider hybrid approaches. You could try to mix
> recommendations based on two different approaches -- clicks-based and
> content-based. The problem is knowing how to mix things since the scores
> are
> not at all comparable.
>
> That Elkan / Menon paper has an elegant theoretical formulation of a
> recommender that uses both ratings and side info at the same time.
>
>
> On Mon, Jan 31, 2011 at 11:26 PM, Gökhan Çapan <gk...@gmail.com> wrote:
>
> > Hi,
> >
> > I've made a search, sorry in case this is a double post.
> > Also, this question may not be directly related to Mahout.
> >
> > Within a domain which is enitrely user generated and has a very big item
> > churn (lots of new items coming, while some others leaving the system),
> > what
> > do you recommend to produce accurate recommendations using Mahout (Not
> just
> > Taste)?
> >
> > I mean, as a concrete example, in the eBay domain, not Amazon's.
> >
> > Currently I am creating item clusters using LSH with MinHash (I am not
> sure
> > if it is in Mahout, I can contribute if it is not), and produce
> > recommendations using these item clusters (profiles). When a new item
> > arrives, I find its nearest profile, and recommend the item where its
> > belonging profile is recommended to. Do you find this approach good
> enough?
> >
> > If you have a theoretical idea, could you please point me to some related
> > papers?
> >
> > (As an MSc student, I can implement this as a Google Summer of Code
> > project,
> > with your mentoring.)
> >
> > Thanks in advance
> >
> > --
> > Gokhan
> >
>

Re: Recommeding on Dynamic Content

Posted by Ted Dunning <te...@gmail.com>.

Low cost access for this paper is here:
http://www.deepdyve.com/lp/association-for-computing-machinery/regression-based-latent-factor-models-1ebJXMCs0K

(shameless plug.  I used to work at Deepdvye and am still an advisor)

On Tue, Feb 1, 2011 at 12:27 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> There's also a paper from Yahoo! research "Regression-based Latent Factor
> Models" http://portal.acm.org/citation.cfm?id=1557029
>
> What i like about this is that it doesn't focus on a particular method to
> combine the models to regress on static profile data + side info. I think
> it
> might be combined with methods ALS-WS  which unlke SGD are
> hadoop-parallelizable to do stage computations. It also serves pretty good
> in situations when there are dyadic interactions but different types
> interaction context (side info) are available (or sometimes none at all)
> but
> static profile information is always available. I think we'll have to get
> on
> this problem pretty soon .
>
>
> On Tue, Feb 1, 2011 at 8:24 AM, Ted Dunning <te...@gmail.com> wrote:
>
> > And the Mahout-525 github branch of mahout that I started has an
> apparently
> > working version for this algorithm.
> >
> > I would love to support anyone who wants to do last mile work on that
> > stuff.
> >
> > See https://issues.apache.org/jira/browse/MAHOUT-525 for more info
> >
> > On Tue, Feb 1, 2011 at 1:52 AM, Sean Owen <sr...@gmail.com> wrote:
> >
> > > That Elkan / Menon paper has an elegant theoretical formulation of a
> > > recommender that uses both ratings and side info at the same time.
> > >
> >
>

Re: Recommeding on Dynamic Content

Posted by Ted Dunning <te...@gmail.com>.

Ooops.

Yes.  That link is good.  But here is the one that actually illustrates what
I was claiming:

http://lccc.eecs.berkeley.edu/Slides/Weimer_10.pdf

On Tue, Feb 1, 2011 at 1:49 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> The link seems to be a nice summary presentation of the Yahoo paper, same
> authors. Nice.
>
> On Tue, Feb 1, 2011 at 1:09 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > This looks (based on the first page) very similar to the Menon and Elkan
> > paper.
> >
> > Note that parallel != fast.  The LLL implementation of Menon and Elkan
> > reportedly munches all of netFlix in about 8 minutes if I remember
> > correctly.  Most batch update gradient methods are highly parallelizable,
> > but are slower even after parallelization than sequential SGD
> > implementations.  In Mahout on the relatively small 20 newsgroups, SGD is
> > faster than anything else we have.  This applies to pretty large problem
> > sizes (10's of millions of training examples after stratified
> > down-sampling,
> > billions before).
> >
> > Conversely, just because SGD isn't normally parallelized, doesn't mean it
> > can't be.  See here for a counter-example:
> > http://www.ideal.ece.utexas.edu/seminar/LatentFactorModels.pdf  (thanks
> to
> > Isabel for hooking me up with Markus)
> >
> > On Tue, Feb 1, 2011 at 12:27 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > There's also a paper from Yahoo! research "Regression-based Latent
> Factor
> > > Models" http://portal.acm.org/citation.cfm?id=1557029
> > >
> > > What i like about this is that it doesn't focus on a particular method
> to
> > > combine the models to regress on static profile data + side info. I
> think
> > > it
> > > might be combined with methods ALS-WS  which unlke SGD are
> > > hadoop-parallelizable to do stage computations. It also serves pretty
> > good
> > > in situations when there are dyadic interactions but different types
> > > interaction context (side info) are available (or sometimes none at
> all)
> > > but
> > > static profile information is always available. I think we'll have to
> get
> > > on
> > > this problem pretty soon .
> > >
> > >
> > > On Tue, Feb 1, 2011 at 8:24 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> > >
> > > > And the Mahout-525 github branch of mahout that I started has an
> > > apparently
> > > > working version for this algorithm.
> > > >
> > > > I would love to support anyone who wants to do last mile work on that
> > > > stuff.
> > > >
> > > > See https://issues.apache.org/jira/browse/MAHOUT-525 for more info
> > > >
> > > > On Tue, Feb 1, 2011 at 1:52 AM, Sean Owen <sr...@gmail.com> wrote:
> > > >
> > > > > That Elkan / Menon paper has an elegant theoretical formulation of
> a
> > > > > recommender that uses both ratings and side info at the same time.
> > > > >
> > > >
> > >
> >
>

Re: Recommeding on Dynamic Content

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

The link seems to be a nice summary presentation of the Yahoo paper, same
authors. Nice.

On Tue, Feb 1, 2011 at 1:09 PM, Ted Dunning <te...@gmail.com> wrote:

> This looks (based on the first page) very similar to the Menon and Elkan
> paper.
>
> Note that parallel != fast.  The LLL implementation of Menon and Elkan
> reportedly munches all of netFlix in about 8 minutes if I remember
> correctly.  Most batch update gradient methods are highly parallelizable,
> but are slower even after parallelization than sequential SGD
> implementations.  In Mahout on the relatively small 20 newsgroups, SGD is
> faster than anything else we have.  This applies to pretty large problem
> sizes (10's of millions of training examples after stratified
> down-sampling,
> billions before).
>
> Conversely, just because SGD isn't normally parallelized, doesn't mean it
> can't be.  See here for a counter-example:
> http://www.ideal.ece.utexas.edu/seminar/LatentFactorModels.pdf  (thanks to
> Isabel for hooking me up with Markus)
>
> On Tue, Feb 1, 2011 at 12:27 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > There's also a paper from Yahoo! research "Regression-based Latent Factor
> > Models" http://portal.acm.org/citation.cfm?id=1557029
> >
> > What i like about this is that it doesn't focus on a particular method to
> > combine the models to regress on static profile data + side info. I think
> > it
> > might be combined with methods ALS-WS  which unlke SGD are
> > hadoop-parallelizable to do stage computations. It also serves pretty
> good
> > in situations when there are dyadic interactions but different types
> > interaction context (side info) are available (or sometimes none at all)
> > but
> > static profile information is always available. I think we'll have to get
> > on
> > this problem pretty soon .
> >
> >
> > On Tue, Feb 1, 2011 at 8:24 AM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> > > And the Mahout-525 github branch of mahout that I started has an
> > apparently
> > > working version for this algorithm.
> > >
> > > I would love to support anyone who wants to do last mile work on that
> > > stuff.
> > >
> > > See https://issues.apache.org/jira/browse/MAHOUT-525 for more info
> > >
> > > On Tue, Feb 1, 2011 at 1:52 AM, Sean Owen <sr...@gmail.com> wrote:
> > >
> > > > That Elkan / Menon paper has an elegant theoretical formulation of a
> > > > recommender that uses both ratings and side info at the same time.
> > > >
> > >
> >
>

Re: Recommeding on Dynamic Content

Posted by Ted Dunning <te...@gmail.com>.

This looks (based on the first page) very similar to the Menon and Elkan
paper.

Note that parallel != fast.  The LLL implementation of Menon and Elkan
reportedly munches all of netFlix in about 8 minutes if I remember
correctly.  Most batch update gradient methods are highly parallelizable,
but are slower even after parallelization than sequential SGD
implementations.  In Mahout on the relatively small 20 newsgroups, SGD is
faster than anything else we have.  This applies to pretty large problem
sizes (10's of millions of training examples after stratified down-sampling,
billions before).

Conversely, just because SGD isn't normally parallelized, doesn't mean it
can't be.  See here for a counter-example:
http://www.ideal.ece.utexas.edu/seminar/LatentFactorModels.pdf  (thanks to
Isabel for hooking me up with Markus)

On Tue, Feb 1, 2011 at 12:27 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> There's also a paper from Yahoo! research "Regression-based Latent Factor
> Models" http://portal.acm.org/citation.cfm?id=1557029
>
> What i like about this is that it doesn't focus on a particular method to
> combine the models to regress on static profile data + side info. I think
> it
> might be combined with methods ALS-WS  which unlke SGD are
> hadoop-parallelizable to do stage computations. It also serves pretty good
> in situations when there are dyadic interactions but different types
> interaction context (side info) are available (or sometimes none at all)
> but
> static profile information is always available. I think we'll have to get
> on
> this problem pretty soon .
>
>
> On Tue, Feb 1, 2011 at 8:24 AM, Ted Dunning <te...@gmail.com> wrote:
>
> > And the Mahout-525 github branch of mahout that I started has an
> apparently
> > working version for this algorithm.
> >
> > I would love to support anyone who wants to do last mile work on that
> > stuff.
> >
> > See https://issues.apache.org/jira/browse/MAHOUT-525 for more info
> >
> > On Tue, Feb 1, 2011 at 1:52 AM, Sean Owen <sr...@gmail.com> wrote:
> >
> > > That Elkan / Menon paper has an elegant theoretical formulation of a
> > > recommender that uses both ratings and side info at the same time.
> > >
> >
>

Re: Recommeding on Dynamic Content

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

There's also a paper from Yahoo! research "Regression-based Latent Factor
Models" http://portal.acm.org/citation.cfm?id=1557029

What i like about this is that it doesn't focus on a particular method to
combine the models to regress on static profile data + side info. I think it
might be combined with methods ALS-WS  which unlke SGD are
hadoop-parallelizable to do stage computations. It also serves pretty good
in situations when there are dyadic interactions but different types
interaction context (side info) are available (or sometimes none at all) but
static profile information is always available. I think we'll have to get on
this problem pretty soon .

On Tue, Feb 1, 2011 at 8:24 AM, Ted Dunning <te...@gmail.com> wrote:

> And the Mahout-525 github branch of mahout that I started has an apparently
> working version for this algorithm.
>
> I would love to support anyone who wants to do last mile work on that
> stuff.
>
> See https://issues.apache.org/jira/browse/MAHOUT-525 for more info
>
> On Tue, Feb 1, 2011 at 1:52 AM, Sean Owen <sr...@gmail.com> wrote:
>
> > That Elkan / Menon paper has an elegant theoretical formulation of a
> > recommender that uses both ratings and side info at the same time.
> >
>

Re: Recommeding on Dynamic Content

Posted by Ted Dunning <te...@gmail.com>.

And the Mahout-525 github branch of mahout that I started has an apparently
working version for this algorithm.

I would love to support anyone who wants to do last mile work on that stuff.

See https://issues.apache.org/jira/browse/MAHOUT-525 for more info

On Tue, Feb 1, 2011 at 1:52 AM, Sean Owen <sr...@gmail.com> wrote:

> That Elkan / Menon paper has an elegant theoretical formulation of a
> recommender that uses both ratings and side info at the same time.
>

Re: Recommeding on Dynamic Content

Posted by Sean Owen <sr...@gmail.com>.

One approach is to use user-user similarities. Those build up over time
based on historical data, but can be used to produce recommendations for
brand-new items going forward.

It still has a cold-start problem; until anyone connects to one of those new
items, it can't be recommended.

Another approach is to use the item's characteristics to determine some
notion of similarity, in the absence of clicks. That's what you're doing and
it's a great approach.

You can also consider hybrid approaches. You could try to mix
recommendations based on two different approaches -- clicks-based and
content-based. The problem is knowing how to mix things since the scores are
not at all comparable.

That Elkan / Menon paper has an elegant theoretical formulation of a
recommender that uses both ratings and side info at the same time.

On Mon, Jan 31, 2011 at 11:26 PM, Gökhan Çapan <gk...@gmail.com> wrote:

> Hi,
>
> I've made a search, sorry in case this is a double post.
> Also, this question may not be directly related to Mahout.
>
> Within a domain which is enitrely user generated and has a very big item
> churn (lots of new items coming, while some others leaving the system),
> what
> do you recommend to produce accurate recommendations using Mahout (Not just
> Taste)?
>
> I mean, as a concrete example, in the eBay domain, not Amazon's.
>
> Currently I am creating item clusters using LSH with MinHash (I am not sure
> if it is in Mahout, I can contribute if it is not), and produce
> recommendations using these item clusters (profiles). When a new item
> arrives, I find its nearest profile, and recommend the item where its
> belonging profile is recommended to. Do you find this approach good enough?
>
> If you have a theoretical idea, could you please point me to some related
> papers?
>
> (As an MSc student, I can implement this as a Google Summer of Code
> project,
> with your mentoring.)
>
> Thanks in advance
>
> --
> Gokhan
>

Re: Recommeding on Dynamic Content

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

Yahoo is building what they say 2-stage hierarchical model. I am not
arguing that they use EM etc. to solve individual stages. I understand
that. I am not arguing that they are primarily motivated by solving
cold start problem. I understand that as well.

but what they build is similar reasoning, if not the same, as here :
http://en.wikipedia.org/wiki/Hierarchical_Bayes_model Is it not? It is
possible i am mixing things here, this hierarchy is not directly
Bayesian, but motivation here is similar?

I am just saying that we can generalize problem to hierarchies that
don't have to be 2-stage. That's all.
I am also saying that a practical problem i have at hand is also more
than 2 stage. I don't know what would be the best way to solve it. But
it seems to me that hierarchical learning analogous to these could be
extended to a more general case with multiple hierarchies on the side
info or even user/item content profiles.

For example, say sometimes user & item interact and you always know
time of the day when it happen. (just sheer example). but sometimes
(far from always) you also happen to know the weather. or/and Geo
where it happen. Can't we make use of that information with an
addiiton of another stage to the hierarchy?


On Wed, Feb 2, 2011 at 8:54 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> I am basically retracing generalization of the Bayesian inference
> problem given in Yahoo paper. I am too lazy to go back for a quote.
>
>  The SVD problem was discussed at meetups, basically the criticism
> here is that for RxC matrix whenever there's a missing measurement,
> one can't specify 'no measurement' but rather have to leave it at some
> neutral value (0? average?) which is essentially nothing but a noise
> since it's not a sample. As one guy from Stanford demonstrated on
> Netflix data, the whole system collapses very quickly after certain
> threshold of sample sparsity is reached.
>
> On Wed, Feb 2, 2011 at 7:20 PM, Ted Dunning <te...@gmail.com> wrote:
>> Dmitriy,
>> I am not clear what you are saying entirely, but as far as I can understand
>> your points, I think I disagree.  Of course, if I don't catch your drift, I
>> might be wrong and we might be in agreement.
>>
>> On Wed, Feb 2, 2011 at 2:43 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>>
>>> both Elkan's work and Yahoo's paper are based on the notion (which is
>>> confirmed by SGD experience) that if we try to substitute missing data with
>>> neutral values, the whole learning falls apart. Sort of.
>>
>> I don't see why you say that.  Elkan and Yahoo want to avoid the cold start
>> process by using user and item offsets and by using latent factors to smooth
>> the recommendation process.
>>
>>>
>>> I.e. if we always know some context A (in this case, static labels and
>>> dyadic ids) and only sometimes some context B, then assuming neutral values
>>> for context B if we are missing this data is invalid because we are actually
>>> substituting unknown data with made-up data.
>>
>> This is abstract that I don't know what you are referring to really.  Yes,
>> static characteristics will be used if they are available and latent factors
>> will be used if they are available.
>>
>>>
>>> Which is why SGD produces higher errors than necessary on sparsified label
>>> data. this is also the reason why SVD recommenders produce higher errors
>>> over sparse sample data as well (i think that's  the consensus).
>>
>> I don't think I am part of that consensus.
>> SGD produces very low errors when used with sparse data.  But it can also
>> use non-sparse features just as well.  Why do you mean "higher errors than
>> necessary"?  That lower error rates are possible with latent factor
>> techniques?
>>
>>>
>>> However, thinking in offline-ish mode, if we learn based on samples with A
>>> data, then freeze the learner and learn based on error between frozen
>>> learner for A and only the input that has context B, for learner B, then we
>>> are not making the mistake per above. At no point our learner takes any
>>> 'made-up' data.
>>
>> Are you talking about the alternating learning process in Menon and Elkan?
>>
>>>
>>> This whole notion is based on Bayesian inference process: what can you say
>>> if you only know A; and what correction would you make if you also new B.
>>
>> ?!??
>> The process is roughly analogous to an EM algorithm, but not very.
>>
>>>
>>> Both papers do a corner case out of this: we have two types of data, A and
>>> B, and we learn A then freeze leaner A, then learn B where available.
>>>
>>> But general case doesn't have to be A and B. Actually that's our case (our
>>> CEO calls it 'trunk-brunch-leaf' case): We always know some context A, and
>>> sometimes B, and also sometimes we know all of A, B and some addiional
>>> context C.
>>>
>>> so there's a case to be made to generalize the inference architecture:
>>> specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever else.
>>
>> I think that these analogies are very strained.
>>
>>
>

Re: Recommeding on Dynamic Content

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

Yes, I was referring to Andrea Montanari. My apologies for "guy from
Stanford" reference. I wasn't aware of the paper but I was present at his
talk about his work, it was quite informative.

On Wed, Feb 2, 2011 at 11:49 PM, Federico Castanedo <fc...@inf.uc3m.es>wrote:

> Hi all,
>
> Dimitry, I guess you are talking about this paper of Andrea Montanari, am i
> correct?
>
> Matrix Completion from Noisy Entries. http://arxiv.org/abs/0906.2027v1
>
> 2011/2/3 Dmitriy Lyubimov <dl...@gmail.com>
>
> > I am basically retracing generalization of the Bayesian inference
> > problem given in Yahoo paper. I am too lazy to go back for a quote.
> >
> >  The SVD problem was discussed at meetups, basically the criticism
> > here is that for RxC matrix whenever there's a missing measurement,
> > one can't specify 'no measurement' but rather have to leave it at some
> > neutral value (0? average?) which is essentially nothing but a noise
> > since it's not a sample. As one guy from Stanford demonstrated on
> > Netflix data, the whole system collapses very quickly after certain
> > threshold of sample sparsity is reached.
> >
> > On Wed, Feb 2, 2011 at 7:20 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > > Dmitriy,
> > > I am not clear what you are saying entirely, but as far as I can
> > understand
> > > your points, I think I disagree.  Of course, if I don't catch your
> drift,
> > I
> > > might be wrong and we might be in agreement.
> > >
> > > On Wed, Feb 2, 2011 at 2:43 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> > >>
> > >> both Elkan's work and Yahoo's paper are based on the notion (which is
> > >> confirmed by SGD experience) that if we try to substitute missing data
> > with
> > >> neutral values, the whole learning falls apart. Sort of.
> > >
> > > I don't see why you say that.  Elkan and Yahoo want to avoid the cold
> > start
> > > process by using user and item offsets and by using latent factors to
> > smooth
> > > the recommendation process.
> > >
> > >>
> > >> I.e. if we always know some context A (in this case, static labels and
> > >> dyadic ids) and only sometimes some context B, then assuming neutral
> > values
> > >> for context B if we are missing this data is invalid because we are
> > actually
> > >> substituting unknown data with made-up data.
> > >
> > > This is abstract that I don't know what you are referring to really.
> >  Yes,
> > > static characteristics will be used if they are available and latent
> > factors
> > > will be used if they are available.
> > >
> > >>
> > >> Which is why SGD produces higher errors than necessary on sparsified
> > label
> > >> data. this is also the reason why SVD recommenders produce higher
> errors
> > >> over sparse sample data as well (i think that's  the consensus).
> > >
> > > I don't think I am part of that consensus.
> > > SGD produces very low errors when used with sparse data.  But it can
> also
> > > use non-sparse features just as well.  Why do you mean "higher errors
> > than
> > > necessary"?  That lower error rates are possible with latent factor
> > > techniques?
> > >
> > >>
> > >> However, thinking in offline-ish mode, if we learn based on samples
> with
> > A
> > >> data, then freeze the learner and learn based on error between frozen
> > >> learner for A and only the input that has context B, for learner B,
> then
> > we
> > >> are not making the mistake per above. At no point our learner takes
> any
> > >> 'made-up' data.
> > >
> > > Are you talking about the alternating learning process in Menon and
> > Elkan?
> > >
> > >>
> > >> This whole notion is based on Bayesian inference process: what can you
> > say
> > >> if you only know A; and what correction would you make if you also new
> > B.
> > >
> > > ?!??
> > > The process is roughly analogous to an EM algorithm, but not very.
> > >
> > >>
> > >> Both papers do a corner case out of this: we have two types of data, A
> > and
> > >> B, and we learn A then freeze leaner A, then learn B where available.
> > >>
> > >> But general case doesn't have to be A and B. Actually that's our case
> > (our
> > >> CEO calls it 'trunk-brunch-leaf' case): We always know some context A,
> > and
> > >> sometimes B, and also sometimes we know all of A, B and some addiional
> > >> context C.
> > >>
> > >> so there's a case to be made to generalize the inference architecture:
> > >> specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever
> else.
> > >
> > > I think that these analogies are very strained.
> > >
> > >
> >
>

Re: Recommeding on Dynamic Content

Posted by Federico Castanedo <fc...@inf.uc3m.es>.

Hi all,

Dimitry, I guess you are talking about this paper of Andrea Montanari, am i
correct?

Matrix Completion from Noisy Entries. http://arxiv.org/abs/0906.2027v1

2011/2/3 Dmitriy Lyubimov <dl...@gmail.com>

> I am basically retracing generalization of the Bayesian inference
> problem given in Yahoo paper. I am too lazy to go back for a quote.
>
>  The SVD problem was discussed at meetups, basically the criticism
> here is that for RxC matrix whenever there's a missing measurement,
> one can't specify 'no measurement' but rather have to leave it at some
> neutral value (0? average?) which is essentially nothing but a noise
> since it's not a sample. As one guy from Stanford demonstrated on
> Netflix data, the whole system collapses very quickly after certain
> threshold of sample sparsity is reached.
>
> On Wed, Feb 2, 2011 at 7:20 PM, Ted Dunning <te...@gmail.com> wrote:
> > Dmitriy,
> > I am not clear what you are saying entirely, but as far as I can
> understand
> > your points, I think I disagree.  Of course, if I don't catch your drift,
> I
> > might be wrong and we might be in agreement.
> >
> > On Wed, Feb 2, 2011 at 2:43 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> >>
> >> both Elkan's work and Yahoo's paper are based on the notion (which is
> >> confirmed by SGD experience) that if we try to substitute missing data
> with
> >> neutral values, the whole learning falls apart. Sort of.
> >
> > I don't see why you say that.  Elkan and Yahoo want to avoid the cold
> start
> > process by using user and item offsets and by using latent factors to
> smooth
> > the recommendation process.
> >
> >>
> >> I.e. if we always know some context A (in this case, static labels and
> >> dyadic ids) and only sometimes some context B, then assuming neutral
> values
> >> for context B if we are missing this data is invalid because we are
> actually
> >> substituting unknown data with made-up data.
> >
> > This is abstract that I don't know what you are referring to really.
>  Yes,
> > static characteristics will be used if they are available and latent
> factors
> > will be used if they are available.
> >
> >>
> >> Which is why SGD produces higher errors than necessary on sparsified
> label
> >> data. this is also the reason why SVD recommenders produce higher errors
> >> over sparse sample data as well (i think that's  the consensus).
> >
> > I don't think I am part of that consensus.
> > SGD produces very low errors when used with sparse data.  But it can also
> > use non-sparse features just as well.  Why do you mean "higher errors
> than
> > necessary"?  That lower error rates are possible with latent factor
> > techniques?
> >
> >>
> >> However, thinking in offline-ish mode, if we learn based on samples with
> A
> >> data, then freeze the learner and learn based on error between frozen
> >> learner for A and only the input that has context B, for learner B, then
> we
> >> are not making the mistake per above. At no point our learner takes any
> >> 'made-up' data.
> >
> > Are you talking about the alternating learning process in Menon and
> Elkan?
> >
> >>
> >> This whole notion is based on Bayesian inference process: what can you
> say
> >> if you only know A; and what correction would you make if you also new
> B.
> >
> > ?!??
> > The process is roughly analogous to an EM algorithm, but not very.
> >
> >>
> >> Both papers do a corner case out of this: we have two types of data, A
> and
> >> B, and we learn A then freeze leaner A, then learn B where available.
> >>
> >> But general case doesn't have to be A and B. Actually that's our case
> (our
> >> CEO calls it 'trunk-brunch-leaf' case): We always know some context A,
> and
> >> sometimes B, and also sometimes we know all of A, B and some addiional
> >> context C.
> >>
> >> so there's a case to be made to generalize the inference architecture:
> >> specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever else.
> >
> > I think that these analogies are very strained.
> >
> >
>

Re: Recommeding on Dynamic Content

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

Federico, thanks for the reference.

On Thu, Feb 3, 2011 at 1:55 AM, Federico Castanedo <fc...@inf.uc3m.es>wrote:

> Hi Dimitry,
>
> I'm not sure if this algorithm:
>
> http://www.stanford.edu/~raghuram/optspace/index.html<http://www.stanford.edu/%7Eraghuram/optspace/index.html>
>
> could helps in the case of missing information in SGD, but it seems they
> have a very efficient approach
> in the case of unknown ratings in CF tasks using SVD.
>
> 2011/2/3 Dmitriy Lyubimov <dl...@gmail.com>
>
> > And i was referring to SVD recommender, not SGD here. SGD indeed takes
> > care of that kind of problem since it doesn't examine "empty cells" in
> > case of latent factor computation during solving factorization
> > problems.
> >
> > But I think there's similar problem with missing side information
> > labels in case of SGD: say we have a bunch of probes and we are
> > reading signals off of them at certain intervals. but now and then we
> > fail to read some of them. Actually, we fail pretty often. But regular
> > SGD doesn't 'freeze' learning for inputs we failed to read off. We are
> > forced to put some values there; and least harmless, it seems, is the
> > average, since it doesn't cause any learning to happen on that
> > particular input. But I think it does cause regularization to count a
> > generation thus cancelling some of the learning. Whereas if we grouped
> > missing inputs into separate learners and did hierarchical learning,
> > that would not be happening. That's what i meant by SGD producing
> > slightly more erorrs in this case compared to what  it seems to be
> > possible to do with hierarchies.
> >
> > similarity between those cases (sparse SVD and SGD inputs) is that in
> > every case we are forced to feed a 'made-up' data to learners, because
> > we failed to observe it in a sample.
> >
> > On Wed, Feb 2, 2011 at 11:05 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> > > That is a property of sparsity and connectedness, not SGD.
> > >
> > > On Wed, Feb 2, 2011 at 8:54 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> > >>
> > >> As one guy from Stanford demonstrated on
> > >> Netflix data, the whole system collapses very quickly after certain
> > >> threshold of sample sparsity is reached.
> > >
> > >
> >
>

Re: Recommeding on Dynamic Content

Posted by Federico Castanedo <fc...@inf.uc3m.es>.

Hi Dimitry,

I'm not sure if this algorithm:

http://www.stanford.edu/~raghuram/optspace/index.html

could helps in the case of missing information in SGD, but it seems they
have a very efficient approach
in the case of unknown ratings in CF tasks using SVD.

2011/2/3 Dmitriy Lyubimov <dl...@gmail.com>

> And i was referring to SVD recommender, not SGD here. SGD indeed takes
> care of that kind of problem since it doesn't examine "empty cells" in
> case of latent factor computation during solving factorization
> problems.
>
> But I think there's similar problem with missing side information
> labels in case of SGD: say we have a bunch of probes and we are
> reading signals off of them at certain intervals. but now and then we
> fail to read some of them. Actually, we fail pretty often. But regular
> SGD doesn't 'freeze' learning for inputs we failed to read off. We are
> forced to put some values there; and least harmless, it seems, is the
> average, since it doesn't cause any learning to happen on that
> particular input. But I think it does cause regularization to count a
> generation thus cancelling some of the learning. Whereas if we grouped
> missing inputs into separate learners and did hierarchical learning,
> that would not be happening. That's what i meant by SGD producing
> slightly more erorrs in this case compared to what  it seems to be
> possible to do with hierarchies.
>
> similarity between those cases (sparse SVD and SGD inputs) is that in
> every case we are forced to feed a 'made-up' data to learners, because
> we failed to observe it in a sample.
>
> On Wed, Feb 2, 2011 at 11:05 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > That is a property of sparsity and connectedness, not SGD.
> >
> > On Wed, Feb 2, 2011 at 8:54 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> >>
> >> As one guy from Stanford demonstrated on
> >> Netflix data, the whole system collapses very quickly after certain
> >> threshold of sample sparsity is reached.
> >
> >
>

Re: Recommeding on Dynamic Content

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

And i was referring to SVD recommender, not SGD here. SGD indeed takes
care of that kind of problem since it doesn't examine "empty cells" in
case of latent factor computation during solving factorization
problems.

But I think there's similar problem with missing side information
labels in case of SGD: say we have a bunch of probes and we are
reading signals off of them at certain intervals. but now and then we
fail to read some of them. Actually, we fail pretty often. But regular
SGD doesn't 'freeze' learning for inputs we failed to read off. We are
forced to put some values there; and least harmless, it seems, is the
average, since it doesn't cause any learning to happen on that
particular input. But I think it does cause regularization to count a
generation thus cancelling some of the learning. Whereas if we grouped
missing inputs into separate learners and did hierarchical learning,
that would not be happening. That's what i meant by SGD producing
slightly more erorrs in this case compared to what  it seems to be
possible to do with hierarchies.

similarity between those cases (sparse SVD and SGD inputs) is that in
every case we are forced to feed a 'made-up' data to learners, because
we failed to observe it in a sample.

On Wed, Feb 2, 2011 at 11:05 PM, Ted Dunning <te...@gmail.com> wrote:
> That is a property of sparsity and connectedness, not SGD.
>
> On Wed, Feb 2, 2011 at 8:54 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>
>> As one guy from Stanford demonstrated on
>> Netflix data, the whole system collapses very quickly after certain
>> threshold of sample sparsity is reached.
>
>

Re: Recommeding on Dynamic Content

Posted by Ted Dunning <te...@gmail.com>.

That is a property of sparsity and connectedness, not SGD.

On Wed, Feb 2, 2011 at 8:54 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> As one guy from Stanford demonstrated on
> Netflix data, the whole system collapses very quickly after certain
> threshold of sample sparsity is reached.
>

Re: Recommeding on Dynamic Content

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

I am basically retracing generalization of the Bayesian inference
problem given in Yahoo paper. I am too lazy to go back for a quote.

 The SVD problem was discussed at meetups, basically the criticism
here is that for RxC matrix whenever there's a missing measurement,
one can't specify 'no measurement' but rather have to leave it at some
neutral value (0? average?) which is essentially nothing but a noise
since it's not a sample. As one guy from Stanford demonstrated on
Netflix data, the whole system collapses very quickly after certain
threshold of sample sparsity is reached.

On Wed, Feb 2, 2011 at 7:20 PM, Ted Dunning <te...@gmail.com> wrote:
> Dmitriy,
> I am not clear what you are saying entirely, but as far as I can understand
> your points, I think I disagree.  Of course, if I don't catch your drift, I
> might be wrong and we might be in agreement.
>
> On Wed, Feb 2, 2011 at 2:43 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>
>> both Elkan's work and Yahoo's paper are based on the notion (which is
>> confirmed by SGD experience) that if we try to substitute missing data with
>> neutral values, the whole learning falls apart. Sort of.
>
> I don't see why you say that.  Elkan and Yahoo want to avoid the cold start
> process by using user and item offsets and by using latent factors to smooth
> the recommendation process.
>
>>
>> I.e. if we always know some context A (in this case, static labels and
>> dyadic ids) and only sometimes some context B, then assuming neutral values
>> for context B if we are missing this data is invalid because we are actually
>> substituting unknown data with made-up data.
>
> This is abstract that I don't know what you are referring to really.  Yes,
> static characteristics will be used if they are available and latent factors
> will be used if they are available.
>
>>
>> Which is why SGD produces higher errors than necessary on sparsified label
>> data. this is also the reason why SVD recommenders produce higher errors
>> over sparse sample data as well (i think that's  the consensus).
>
> I don't think I am part of that consensus.
> SGD produces very low errors when used with sparse data.  But it can also
> use non-sparse features just as well.  Why do you mean "higher errors than
> necessary"?  That lower error rates are possible with latent factor
> techniques?
>
>>
>> However, thinking in offline-ish mode, if we learn based on samples with A
>> data, then freeze the learner and learn based on error between frozen
>> learner for A and only the input that has context B, for learner B, then we
>> are not making the mistake per above. At no point our learner takes any
>> 'made-up' data.
>
> Are you talking about the alternating learning process in Menon and Elkan?
>
>>
>> This whole notion is based on Bayesian inference process: what can you say
>> if you only know A; and what correction would you make if you also new B.
>
> ?!??
> The process is roughly analogous to an EM algorithm, but not very.
>
>>
>> Both papers do a corner case out of this: we have two types of data, A and
>> B, and we learn A then freeze leaner A, then learn B where available.
>>
>> But general case doesn't have to be A and B. Actually that's our case (our
>> CEO calls it 'trunk-brunch-leaf' case): We always know some context A, and
>> sometimes B, and also sometimes we know all of A, B and some addiional
>> context C.
>>
>> so there's a case to be made to generalize the inference architecture:
>> specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever else.
>
> I think that these analogies are very strained.
>
>

Re: Recommeding on Dynamic Content

Posted by Ted Dunning <te...@gmail.com>.

Dmitriy,

I am not clear what you are saying entirely, but as far as I can understand
your points, I think I disagree.  Of course, if I don't catch your drift, I
might be wrong and we might be in agreement.

On Wed, Feb 2, 2011 at 2:43 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> both Elkan's work and Yahoo's paper are based on the notion (which is
> confirmed by SGD experience) that if we try to substitute missing data with
> neutral values, the whole learning falls apart. Sort of.
>

I don't see why you say that.  Elkan and Yahoo want to avoid the cold start
process by using user and item offsets and by using latent factors to smooth
the recommendation process.

> I.e. if we always know some context A (in this case, static labels and
> dyadic ids) and only sometimes some context B, then assuming neutral values
> for context B if we are missing this data is invalid because we are actually
> substituting unknown data with made-up data.

This is abstract that I don't know what you are referring to really.  Yes,
static characteristics will be used if they are available and latent factors
will be used if they are available.

> Which is why SGD produces higher errors than necessary on sparsified label
> data. this is also the reason why SVD recommenders produce higher errors
> over sparse sample data as well (i think that's  the consensus).
>

I don't think I am part of that consensus.

SGD produces very low errors when used with sparse data.  But it can also
use non-sparse features just as well.  Why do you mean "higher errors than
necessary"?  That lower error rates are possible with latent factor
techniques?

>
> However, thinking in offline-ish mode, if we learn based on samples with A
> data, then freeze the learner and learn based on error between frozen
> learner for A and only the input that has context B, for learner B, then we
> are not making the mistake per above. At no point our learner takes any
> 'made-up' data.
>

Are you talking about the alternating learning process in Menon and Elkan?

>
> This whole notion is based on Bayesian inference process: what can you say
> if you only know A; and what correction would you make if you also new B.
>

?!??

The process is roughly analogous to an EM algorithm, but not very.

> Both papers do a corner case out of this: we have two types of data, A and
> B, and we learn A then freeze leaner A, then learn B where available.
>
> But general case doesn't have to be A and B. Actually that's our case (our
> CEO calls it 'trunk-brunch-leaf' case): We always know some context A, and
> sometimes B, and also sometimes we know all of A, B and some addiional
> context C.
>
> so there's a case to be made to generalize the inference architecture:
> specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever else.
>

I think that these analogies are very strained.

Re: Recommeding on Dynamic Content

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

Actually, our case is even a little more complex: our hierarchy may be
A/[B/[C|D]], i.e. for some inputs full hierarchy is A/B/C and for some
inputs it is A/B/D, mutually exclusive. Technically, both hierarchies could
be re-learned independently; but it stands to reason that A and B learners
do not have to be re-learned independently just to save on the computation.

Ted has mentioned there's a hierarchy in Mahout, i wonder if it can handle
the case presented, and what class i might look at to see how to set this
up.

-d

On Wed, Feb 2, 2011 at 2:43 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> both Elkan's work and Yahoo's paper are based on the notion (which is
> confirmed by SGD experience) that if we try to substitute missing data with
> neutral values, the whole learning falls apart. Sort of.
>
> I.e. if we always know some context A (in this case, static labels and
> dyadic ids) and only sometimes some context B, then assuming neutral values
> for context B if we are missing this data is invalid because we are actually
> substituting unknown data with made-up data. Which is why SGD produces
> higher errors than necessary on sparsified label data. this is also the
> reason why SVD recommenders produce higher errors over sparse sample data as
> well (i think that's  the consensus).
>
> However, thinking in offline-ish mode, if we learn based on samples with A
> data, then freeze the learner and learn based on error between frozen
> learner for A and only the input that has context B, for learner B, then we
> are not making the mistake per above. At no point our learner takes any
> 'made-up' data.
>
> This whole notion is based on Bayesian inference process: what can you say
> if you only know A; and what correction would you make if you also new B.
>
> Both papers do a corner case out of this: we have two types of data, A and
> B, and we learn A then freeze leaner A, then learn B where available.
>
> But general case doesn't have to be A and B. Actually that's our case (our
> CEO calls it 'trunk-brunch-leaf' case): We always know some context A, and
> sometimes B, and also sometimes we know all of A, B and some addiional
> context C.
>
> so there's a case to be made to generalize the inference architecture:
> specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever else.
>
> -d
>
>
> On Wed, Feb 2, 2011 at 12:14 AM, Sebastian Schelter <ss...@apache.org>wrote:
>
>> Hi Ted,
>>
>> I looked through the paper a while ago. The approach seems to have great
>> potential, especially because of the ability to include side information and
>> to work with nominal and ordinal data. Unfortunately I have to admit that a
>> lot of the mathematical details overextend my understanding. I'd be ready to
>> assist anyone willing to build a recommender from that approach but it's not
>> a thing I could tackle on my own.
>>
>> --sebastian
>>
>> PS: The algorithm took 7 minutes to learn from the movielens 1M dataset,
>> not Netflix.
>>
>>
>> On 01.02.2011 18:02, Ted Dunning wrote:
>>
>>>
>>> Sebastian,
>>>
>>> Have you read the Elkan paper?  Are you interested in (partially) content
>>> based recommendation?
>>>
>>> On Tue, Feb 1, 2011 at 2:02 AM, Sebastian Schelter <ssc@apache.org<mailto:
>>> ssc@apache.org>> wrote:
>>>
>>>    Hi Gökhan,
>>>
>>>    I wanna point you to some papers I came across that deal with
>>>    similar problems:
>>>
>>>    "Google News Personalization: Scalable Online Collaborative
>>>    Filtering" ( http://www2007.org/papers/paper570.pdf ), this paper
>>>    describes how Google uses three algorithms (two of which cluster
>>>    the users) to achieve online recommendation of news articles.
>>>
>>>    "Feature-based recommendation system" (
>>>    http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ),
>>>    this approach didn't really convince me and I think the paper is
>>>    lacking a lot of details, but it might still be an interesting read.
>>>
>>>    --sebastian
>>>
>>>    On 01.02.2011 00:26, Gökhan Çapan wrote:
>>>
>>>        Hi,
>>>
>>>        I've made a search, sorry in case this is a double post.
>>>        Also, this question may not be directly related to Mahout.
>>>
>>>        Within a domain which is enitrely user generated and has a
>>>        very big item
>>>        churn (lots of new items coming, while some others leaving the
>>>        system), what
>>>        do you recommend to produce accurate recommendations using
>>>        Mahout (Not just
>>>        Taste)?
>>>
>>>        I mean, as a concrete example, in the eBay domain, not Amazon's.
>>>
>>>        Currently I am creating item clusters using LSH with MinHash
>>>        (I am not sure
>>>        if it is in Mahout, I can contribute if it is not), and produce
>>>        recommendations using these item clusters (profiles). When a
>>>        new item
>>>        arrives, I find its nearest profile, and recommend the item
>>>        where its
>>>        belonging profile is recommended to. Do you find this approach
>>>        good enough?
>>>
>>>        If you have a theoretical idea, could you please point me to
>>>        some related
>>>        papers?
>>>
>>>        (As an MSc student, I can implement this as a Google Summer of
>>>        Code project,
>>>        with your mentoring.)
>>>
>>>        Thanks in advance
>>>
>>>
>>>
>>>
>>
>

Re: Recommeding on Dynamic Content

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

both Elkan's work and Yahoo's paper are based on the notion (which is
confirmed by SGD experience) that if we try to substitute missing data with
neutral values, the whole learning falls apart. Sort of.

I.e. if we always know some context A (in this case, static labels and
dyadic ids) and only sometimes some context B, then assuming neutral values
for context B if we are missing this data is invalid because we are actually
substituting unknown data with made-up data. Which is why SGD produces
higher errors than necessary on sparsified label data. this is also the
reason why SVD recommenders produce higher errors over sparse sample data as
well (i think that's  the consensus).

However, thinking in offline-ish mode, if we learn based on samples with A
data, then freeze the learner and learn based on error between frozen
learner for A and only the input that has context B, for learner B, then we
are not making the mistake per above. At no point our learner takes any
'made-up' data.

This whole notion is based on Bayesian inference process: what can you say
if you only know A; and what correction would you make if you also new B.

Both papers do a corner case out of this: we have two types of data, A and
B, and we learn A then freeze leaner A, then learn B where available.

But general case doesn't have to be A and B. Actually that's our case (our
CEO calls it 'trunk-brunch-leaf' case): We always know some context A, and
sometimes B, and also sometimes we know all of A, B and some addiional
context C.

so there's a case to be made to generalize the inference architecture:
specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever else.

-d

On Wed, Feb 2, 2011 at 12:14 AM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Ted,
>
> I looked through the paper a while ago. The approach seems to have great
> potential, especially because of the ability to include side information and
> to work with nominal and ordinal data. Unfortunately I have to admit that a
> lot of the mathematical details overextend my understanding. I'd be ready to
> assist anyone willing to build a recommender from that approach but it's not
> a thing I could tackle on my own.
>
> --sebastian
>
> PS: The algorithm took 7 minutes to learn from the movielens 1M dataset,
> not Netflix.
>
>
> On 01.02.2011 18:02, Ted Dunning wrote:
>
>>
>> Sebastian,
>>
>> Have you read the Elkan paper?  Are you interested in (partially) content
>> based recommendation?
>>
>> On Tue, Feb 1, 2011 at 2:02 AM, Sebastian Schelter <ssc@apache.org<mailto:
>> ssc@apache.org>> wrote:
>>
>>    Hi Gökhan,
>>
>>    I wanna point you to some papers I came across that deal with
>>    similar problems:
>>
>>    "Google News Personalization: Scalable Online Collaborative
>>    Filtering" ( http://www2007.org/papers/paper570.pdf ), this paper
>>    describes how Google uses three algorithms (two of which cluster
>>    the users) to achieve online recommendation of news articles.
>>
>>    "Feature-based recommendation system" (
>>    http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ),
>>    this approach didn't really convince me and I think the paper is
>>    lacking a lot of details, but it might still be an interesting read.
>>
>>    --sebastian
>>
>>    On 01.02.2011 00:26, Gökhan Çapan wrote:
>>
>>        Hi,
>>
>>        I've made a search, sorry in case this is a double post.
>>        Also, this question may not be directly related to Mahout.
>>
>>        Within a domain which is enitrely user generated and has a
>>        very big item
>>        churn (lots of new items coming, while some others leaving the
>>        system), what
>>        do you recommend to produce accurate recommendations using
>>        Mahout (Not just
>>        Taste)?
>>
>>        I mean, as a concrete example, in the eBay domain, not Amazon's.
>>
>>        Currently I am creating item clusters using LSH with MinHash
>>        (I am not sure
>>        if it is in Mahout, I can contribute if it is not), and produce
>>        recommendations using these item clusters (profiles). When a
>>        new item
>>        arrives, I find its nearest profile, and recommend the item
>>        where its
>>        belonging profile is recommended to. Do you find this approach
>>        good enough?
>>
>>        If you have a theoretical idea, could you please point me to
>>        some related
>>        papers?
>>
>>        (As an MSc student, I can implement this as a Google Summer of
>>        Code project,
>>        with your mentoring.)
>>
>>        Thanks in advance
>>
>>
>>
>>
>

Re: Recommeding on Dynamic Content

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Ted,

I looked through the paper a while ago. The approach seems to have great 
potential, especially because of the ability to include side information 
and to work with nominal and ordinal data. Unfortunately I have to admit 
that a lot of the mathematical details overextend my understanding. I'd 
be ready to assist anyone willing to build a recommender from that 
approach but it's not a thing I could tackle on my own.

--sebastian

PS: The algorithm took 7 minutes to learn from the movielens 1M dataset, 
not Netflix.

On 01.02.2011 18:02, Ted Dunning wrote:
>
> Sebastian,
>
> Have you read the Elkan paper?  Are you interested in (partially) 
> content based recommendation?
>
> On Tue, Feb 1, 2011 at 2:02 AM, Sebastian Schelter <ssc@apache.org 
> <ma...@apache.org>> wrote:
>
>     Hi Gökhan,
>
>     I wanna point you to some papers I came across that deal with
>     similar problems:
>
>     "Google News Personalization: Scalable Online Collaborative
>     Filtering" ( http://www2007.org/papers/paper570.pdf ), this paper
>     describes how Google uses three algorithms (two of which cluster
>     the users) to achieve online recommendation of news articles.
>
>     "Feature-based recommendation system" (
>     http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ),
>     this approach didn't really convince me and I think the paper is
>     lacking a lot of details, but it might still be an interesting read.
>
>     --sebastian
>
>     On 01.02.2011 00:26, Gökhan Çapan wrote:
>
>         Hi,
>
>         I've made a search, sorry in case this is a double post.
>         Also, this question may not be directly related to Mahout.
>
>         Within a domain which is enitrely user generated and has a
>         very big item
>         churn (lots of new items coming, while some others leaving the
>         system), what
>         do you recommend to produce accurate recommendations using
>         Mahout (Not just
>         Taste)?
>
>         I mean, as a concrete example, in the eBay domain, not Amazon's.
>
>         Currently I am creating item clusters using LSH with MinHash
>         (I am not sure
>         if it is in Mahout, I can contribute if it is not), and produce
>         recommendations using these item clusters (profiles). When a
>         new item
>         arrives, I find its nearest profile, and recommend the item
>         where its
>         belonging profile is recommended to. Do you find this approach
>         good enough?
>
>         If you have a theoretical idea, could you please point me to
>         some related
>         papers?
>
>         (As an MSc student, I can implement this as a Google Summer of
>         Code project,
>         with your mentoring.)
>
>         Thanks in advance
>
>
>

Re: Recommeding on Dynamic Content

Posted by Ted Dunning <te...@gmail.com>.

I don't like their approach.  It doesn't matter what method you use to
optimize the assignment of the co-clusters, they still lack expressive
power.  I also generically don't like non-convex optimizations.

On Tue, Feb 1, 2011 at 11:55 AM, vineet yadav
<vi...@gmail.com>wrote:

> Hi Ted,
> Yes, In paper they have mentioned the point that "locally optimized
> co-clustering gives poor result in iterative learning", so they have used
> evolutionary co-clustering that gives better result.
> Thanks
> Vineet Yadav
>
> On Wed, Feb 2, 2011 at 1:12 AM, Ted Dunning <te...@gmail.com> wrote:
>
> > Co-clustering typically doesn't give really hot results (at least in my
> > reading and experience).
> >
> > On Tue, Feb 1, 2011 at 11:25 AM, vineet yadav
> > <vi...@gmail.com>wrote:
> >
> > > Hi Gökhan,
> > > Also check out paper "Incremental Collaborative Filtering via
> > Evolutionary
> > > Co-clustering"(
> > > http://www.dollar.biz.uiowa.edu/~street/research/recsys10_ecoc.pdf<
> http://www.dollar.biz.uiowa.edu/%7Estreet/research/recsys10_ecoc.pdf>),
> > In
> > > paper, author proposed a method to  use new data in  collaborative
> > > filtering
> > > model incrementally. Here co-clustering is used to cluster row and
> > > column(items and user) simultaneously. Also check master thesis
> > > "RECOMMENDING  ARTICLES FOR AN  ONLINE NEWSPAPER "  (
> > > http://www.ilk.uvt.nl/downloads/pub/papers/hait/kneepkens2009.pdf).
> > > Thanks
> > > Vineet Yadav
> > >
> > > On Tue, Feb 1, 2011 at 10:32 PM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > Sebastian,
> > > >
> > > > Have you read the Elkan paper?  Are you interested in (partially)
> > content
> > > > based recommendation?
> > > >
> > > > On Tue, Feb 1, 2011 at 2:02 AM, Sebastian Schelter <ss...@apache.org>
> > > wrote:
> > > >
> > > > > Hi Gökhan,
> > > > >
> > > > > I wanna point you to some papers I came across that deal with
> similar
> > > > > problems:
> > > > >
> > > > > "Google News Personalization: Scalable Online Collaborative
> > Filtering"
> > > (
> > > > > http://www2007.org/papers/paper570.pdf ), this paper describes how
> > > > Google
> > > > > uses three algorithms (two of which cluster the users) to achieve
> > > online
> > > > > recommendation of news articles.
> > > > >
> > > > > "Feature-based recommendation system" (
> > > > > http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ),
> this
> > > > > approach didn't really convince me and I think the paper is lacking
> a
> > > lot
> > > > of
> > > > > details, but it might still be an interesting read.
> > > > >
> > > > > --sebastian
> > > > >
> > > > > On 01.02.2011 00:26, Gökhan Çapan wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I've made a search, sorry in case this is a double post.
> > > > >> Also, this question may not be directly related to Mahout.
> > > > >>
> > > > >> Within a domain which is enitrely user generated and has a very
> big
> > > item
> > > > >> churn (lots of new items coming, while some others leaving the
> > > system),
> > > > >> what
> > > > >> do you recommend to produce accurate recommendations using Mahout
> > (Not
> > > > >> just
> > > > >> Taste)?
> > > > >>
> > > > >> I mean, as a concrete example, in the eBay domain, not Amazon's.
> > > > >>
> > > > >> Currently I am creating item clusters using LSH with MinHash (I am
> > not
> > > > >> sure
> > > > >> if it is in Mahout, I can contribute if it is not), and produce
> > > > >> recommendations using these item clusters (profiles). When a new
> > item
> > > > >> arrives, I find its nearest profile, and recommend the item where
> > its
> > > > >> belonging profile is recommended to. Do you find this approach
> good
> > > > >> enough?
> > > > >>
> > > > >> If you have a theoretical idea, could you please point me to some
> > > > related
> > > > >> papers?
> > > > >>
> > > > >> (As an MSc student, I can implement this as a Google Summer of
> Code
> > > > >> project,
> > > > >> with your mentoring.)
> > > > >>
> > > > >> Thanks in advance
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Recommeding on Dynamic Content

Posted by vineet yadav <vi...@gmail.com>.

Hi Ted,
Yes, In paper they have mentioned the point that "locally optimized
co-clustering gives poor result in iterative learning", so they have used
evolutionary co-clustering that gives better result.
Thanks
Vineet Yadav

On Wed, Feb 2, 2011 at 1:12 AM, Ted Dunning <te...@gmail.com> wrote:

> Co-clustering typically doesn't give really hot results (at least in my
> reading and experience).
>
> On Tue, Feb 1, 2011 at 11:25 AM, vineet yadav
> <vi...@gmail.com>wrote:
>
> > Hi Gökhan,
> > Also check out paper "Incremental Collaborative Filtering via
> Evolutionary
> > Co-clustering"(
> > http://www.dollar.biz.uiowa.edu/~street/research/recsys10_ecoc.pdf<http://www.dollar.biz.uiowa.edu/%7Estreet/research/recsys10_ecoc.pdf>),
> In
> > paper, author proposed a method to  use new data in  collaborative
> > filtering
> > model incrementally. Here co-clustering is used to cluster row and
> > column(items and user) simultaneously. Also check master thesis
> > "RECOMMENDING  ARTICLES FOR AN  ONLINE NEWSPAPER "  (
> > http://www.ilk.uvt.nl/downloads/pub/papers/hait/kneepkens2009.pdf).
> > Thanks
> > Vineet Yadav
> >
> > On Tue, Feb 1, 2011 at 10:32 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > Sebastian,
> > >
> > > Have you read the Elkan paper?  Are you interested in (partially)
> content
> > > based recommendation?
> > >
> > > On Tue, Feb 1, 2011 at 2:02 AM, Sebastian Schelter <ss...@apache.org>
> > wrote:
> > >
> > > > Hi Gökhan,
> > > >
> > > > I wanna point you to some papers I came across that deal with similar
> > > > problems:
> > > >
> > > > "Google News Personalization: Scalable Online Collaborative
> Filtering"
> > (
> > > > http://www2007.org/papers/paper570.pdf ), this paper describes how
> > > Google
> > > > uses three algorithms (two of which cluster the users) to achieve
> > online
> > > > recommendation of news articles.
> > > >
> > > > "Feature-based recommendation system" (
> > > > http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ), this
> > > > approach didn't really convince me and I think the paper is lacking a
> > lot
> > > of
> > > > details, but it might still be an interesting read.
> > > >
> > > > --sebastian
> > > >
> > > > On 01.02.2011 00:26, Gökhan Çapan wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I've made a search, sorry in case this is a double post.
> > > >> Also, this question may not be directly related to Mahout.
> > > >>
> > > >> Within a domain which is enitrely user generated and has a very big
> > item
> > > >> churn (lots of new items coming, while some others leaving the
> > system),
> > > >> what
> > > >> do you recommend to produce accurate recommendations using Mahout
> (Not
> > > >> just
> > > >> Taste)?
> > > >>
> > > >> I mean, as a concrete example, in the eBay domain, not Amazon's.
> > > >>
> > > >> Currently I am creating item clusters using LSH with MinHash (I am
> not
> > > >> sure
> > > >> if it is in Mahout, I can contribute if it is not), and produce
> > > >> recommendations using these item clusters (profiles). When a new
> item
> > > >> arrives, I find its nearest profile, and recommend the item where
> its
> > > >> belonging profile is recommended to. Do you find this approach good
> > > >> enough?
> > > >>
> > > >> If you have a theoretical idea, could you please point me to some
> > > related
> > > >> papers?
> > > >>
> > > >> (As an MSc student, I can implement this as a Google Summer of Code
> > > >> project,
> > > >> with your mentoring.)
> > > >>
> > > >> Thanks in advance
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: Recommeding on Dynamic Content

Posted by Ted Dunning <te...@gmail.com>.

Co-clustering typically doesn't give really hot results (at least in my
reading and experience).

On Tue, Feb 1, 2011 at 11:25 AM, vineet yadav
<vi...@gmail.com>wrote:

> Hi Gökhan,
> Also check out paper "Incremental Collaborative Filtering via Evolutionary
> Co-clustering"(
> http://www.dollar.biz.uiowa.edu/~street/research/recsys10_ecoc.pdf), In
> paper, author proposed a method to  use new data in  collaborative
> filtering
> model incrementally. Here co-clustering is used to cluster row and
> column(items and user) simultaneously. Also check master thesis
> "RECOMMENDING  ARTICLES FOR AN  ONLINE NEWSPAPER "  (
> http://www.ilk.uvt.nl/downloads/pub/papers/hait/kneepkens2009.pdf).
> Thanks
> Vineet Yadav
>
> On Tue, Feb 1, 2011 at 10:32 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Sebastian,
> >
> > Have you read the Elkan paper?  Are you interested in (partially) content
> > based recommendation?
> >
> > On Tue, Feb 1, 2011 at 2:02 AM, Sebastian Schelter <ss...@apache.org>
> wrote:
> >
> > > Hi Gökhan,
> > >
> > > I wanna point you to some papers I came across that deal with similar
> > > problems:
> > >
> > > "Google News Personalization: Scalable Online Collaborative Filtering"
> (
> > > http://www2007.org/papers/paper570.pdf ), this paper describes how
> > Google
> > > uses three algorithms (two of which cluster the users) to achieve
> online
> > > recommendation of news articles.
> > >
> > > "Feature-based recommendation system" (
> > > http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ), this
> > > approach didn't really convince me and I think the paper is lacking a
> lot
> > of
> > > details, but it might still be an interesting read.
> > >
> > > --sebastian
> > >
> > > On 01.02.2011 00:26, Gökhan Çapan wrote:
> > >
> > >> Hi,
> > >>
> > >> I've made a search, sorry in case this is a double post.
> > >> Also, this question may not be directly related to Mahout.
> > >>
> > >> Within a domain which is enitrely user generated and has a very big
> item
> > >> churn (lots of new items coming, while some others leaving the
> system),
> > >> what
> > >> do you recommend to produce accurate recommendations using Mahout (Not
> > >> just
> > >> Taste)?
> > >>
> > >> I mean, as a concrete example, in the eBay domain, not Amazon's.
> > >>
> > >> Currently I am creating item clusters using LSH with MinHash (I am not
> > >> sure
> > >> if it is in Mahout, I can contribute if it is not), and produce
> > >> recommendations using these item clusters (profiles). When a new item
> > >> arrives, I find its nearest profile, and recommend the item where its
> > >> belonging profile is recommended to. Do you find this approach good
> > >> enough?
> > >>
> > >> If you have a theoretical idea, could you please point me to some
> > related
> > >> papers?
> > >>
> > >> (As an MSc student, I can implement this as a Google Summer of Code
> > >> project,
> > >> with your mentoring.)
> > >>
> > >> Thanks in advance
> > >>
> > >>
> > >
> >
>

Re: Recommeding on Dynamic Content

Posted by vineet yadav <vi...@gmail.com>.

Hi Gökhan,
Also check out paper "Incremental Collaborative Filtering via Evolutionary
Co-clustering"(
http://www.dollar.biz.uiowa.edu/~street/research/recsys10_ecoc.pdf), In
paper, author proposed a method to  use new data in  collaborative filtering
model incrementally. Here co-clustering is used to cluster row and
column(items and user) simultaneously. Also check master thesis
"RECOMMENDING  ARTICLES  FOR  AN  ONLINE  NEWSPAPER "  (
http://www.ilk.uvt.nl/downloads/pub/papers/hait/kneepkens2009.pdf).
Thanks
Vineet Yadav

On Tue, Feb 1, 2011 at 10:32 PM, Ted Dunning <te...@gmail.com> wrote:

> Sebastian,
>
> Have you read the Elkan paper?  Are you interested in (partially) content
> based recommendation?
>
> On Tue, Feb 1, 2011 at 2:02 AM, Sebastian Schelter <ss...@apache.org> wrote:
>
> > Hi Gökhan,
> >
> > I wanna point you to some papers I came across that deal with similar
> > problems:
> >
> > "Google News Personalization: Scalable Online Collaborative Filtering" (
> > http://www2007.org/papers/paper570.pdf ), this paper describes how
> Google
> > uses three algorithms (two of which cluster the users) to achieve online
> > recommendation of news articles.
> >
> > "Feature-based recommendation system" (
> > http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ), this
> > approach didn't really convince me and I think the paper is lacking a lot
> of
> > details, but it might still be an interesting read.
> >
> > --sebastian
> >
> > On 01.02.2011 00:26, Gökhan Çapan wrote:
> >
> >> Hi,
> >>
> >> I've made a search, sorry in case this is a double post.
> >> Also, this question may not be directly related to Mahout.
> >>
> >> Within a domain which is enitrely user generated and has a very big item
> >> churn (lots of new items coming, while some others leaving the system),
> >> what
> >> do you recommend to produce accurate recommendations using Mahout (Not
> >> just
> >> Taste)?
> >>
> >> I mean, as a concrete example, in the eBay domain, not Amazon's.
> >>
> >> Currently I am creating item clusters using LSH with MinHash (I am not
> >> sure
> >> if it is in Mahout, I can contribute if it is not), and produce
> >> recommendations using these item clusters (profiles). When a new item
> >> arrives, I find its nearest profile, and recommend the item where its
> >> belonging profile is recommended to. Do you find this approach good
> >> enough?
> >>
> >> If you have a theoretical idea, could you please point me to some
> related
> >> papers?
> >>
> >> (As an MSc student, I can implement this as a Google Summer of Code
> >> project,
> >> with your mentoring.)
> >>
> >> Thanks in advance
> >>
> >>
> >
>

Re: Recommeding on Dynamic Content

Posted by Ted Dunning <te...@gmail.com>.

Sebastian,

Have you read the Elkan paper?  Are you interested in (partially) content
based recommendation?

On Tue, Feb 1, 2011 at 2:02 AM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Gökhan,
>
> I wanna point you to some papers I came across that deal with similar
> problems:
>
> "Google News Personalization: Scalable Online Collaborative Filtering" (
> http://www2007.org/papers/paper570.pdf ), this paper describes how Google
> uses three algorithms (two of which cluster the users) to achieve online
> recommendation of news articles.
>
> "Feature-based recommendation system" (
> http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ), this
> approach didn't really convince me and I think the paper is lacking a lot of
> details, but it might still be an interesting read.
>
> --sebastian
>
> On 01.02.2011 00:26, Gökhan Çapan wrote:
>
>> Hi,
>>
>> I've made a search, sorry in case this is a double post.
>> Also, this question may not be directly related to Mahout.
>>
>> Within a domain which is enitrely user generated and has a very big item
>> churn (lots of new items coming, while some others leaving the system),
>> what
>> do you recommend to produce accurate recommendations using Mahout (Not
>> just
>> Taste)?
>>
>> I mean, as a concrete example, in the eBay domain, not Amazon's.
>>
>> Currently I am creating item clusters using LSH with MinHash (I am not
>> sure
>> if it is in Mahout, I can contribute if it is not), and produce
>> recommendations using these item clusters (profiles). When a new item
>> arrives, I find its nearest profile, and recommend the item where its
>> belonging profile is recommended to. Do you find this approach good
>> enough?
>>
>> If you have a theoretical idea, could you please point me to some related
>> papers?
>>
>> (As an MSc student, I can implement this as a Google Summer of Code
>> project,
>> with your mentoring.)
>>
>> Thanks in advance
>>
>>
>

Re: Recommeding on Dynamic Content

Posted by Gökhan Çapan <gk...@gmail.com>.

Thanks Sean, Sebastian.

Sean,
a "most similar items" functionality is also critical, so a user based
recommendation approach is not an option for me. I will read the paper you
have suggested.

Sebastian,

Currently I am clustering the items using MinHash, call the clusters as item
profiles, and apply collaborative filtering to these item profiles. When a
new item arrives, I incrementally find the profile they are most similar to,
so they are immediately recommended where their item profile is recommended.

I've read Google's. Actually, MinHash idea came from that paper. (I know
they use MinHash to cluster users different from me)

I have also had a look at Feature-Based Recommendation, it seemed it turns
into a content based recommender to me, I haven't read it in details,
though.

Anyway, I am also thinking on a feature based representation of products,
which will be a combination of some item-specific features with historical
data.

There is an interesting paper from Yahoo: Personalized recommendation on
dynamic content using predictive bilinear models (
www2009.eprints.org/70/1/p691.pdf)

I think they propose a good model based approach for recommendation, as well
as the evaluation. It looks like it may be modified to an item-based
approach..

I will be glad if I can help, if there is a plan to add a model based
recommender to Mahout.

On Tue, Feb 1, 2011 at 12:02 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Gökhan,
>
> I wanna point you to some papers I came across that deal with similar
> problems:
>
> "Google News Personalization: Scalable Online Collaborative Filtering" (
> http://www2007.org/papers/paper570.pdf ), this paper describes how Google
> uses three algorithms (two of which cluster the users) to achieve online
> recommendation of news articles.
>
> "Feature-based recommendation system" (
> http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ), this
> approach didn't really convince me and I think the paper is lacking a lot of
> details, but it might still be an interesting read.
>
> --sebastian
>
>
> On 01.02.2011 00:26, Gökhan Çapan wrote:
>
>> Hi,
>>
>> I've made a search, sorry in case this is a double post.
>> Also, this question may not be directly related to Mahout.
>>
>> Within a domain which is enitrely user generated and has a very big item
>> churn (lots of new items coming, while some others leaving the system),
>> what
>> do you recommend to produce accurate recommendations using Mahout (Not
>> just
>> Taste)?
>>
>> I mean, as a concrete example, in the eBay domain, not Amazon's.
>>
>> Currently I am creating item clusters using LSH with MinHash (I am not
>> sure
>> if it is in Mahout, I can contribute if it is not), and produce
>> recommendations using these item clusters (profiles). When a new item
>> arrives, I find its nearest profile, and recommend the item where its
>> belonging profile is recommended to. Do you find this approach good
>> enough?
>>
>> If you have a theoretical idea, could you please point me to some related
>> papers?
>>
>> (As an MSc student, I can implement this as a Google Summer of Code
>> project,
>> with your mentoring.)
>>
>> Thanks in advance
>>
>>
>

-- 
Gökhan Çapan

Re: Recommeding on Dynamic Content

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Gökhan,

I wanna point you to some papers I came across that deal with similar 
problems:

"Google News Personalization: Scalable Online Collaborative Filtering" ( 
http://www2007.org/papers/paper570.pdf ), this paper describes how 
Google uses three algorithms (two of which cluster the users) to achieve 
online recommendation of news articles.

"Feature-based recommendation system" ( 
http://glaros.dtc.umn.edu/gkhome/fetch/papers/fbrsCIKM05.pdf ), this 
approach didn't really convince me and I think the paper is lacking a 
lot of details, but it might still be an interesting read.

--sebastian

On 01.02.2011 00:26, Gökhan Çapan wrote:
> Hi,
>
> I've made a search, sorry in case this is a double post.
> Also, this question may not be directly related to Mahout.
>
> Within a domain which is enitrely user generated and has a very big item
> churn (lots of new items coming, while some others leaving the system), what
> do you recommend to produce accurate recommendations using Mahout (Not just
> Taste)?
>
> I mean, as a concrete example, in the eBay domain, not Amazon's.
>
> Currently I am creating item clusters using LSH with MinHash (I am not sure
> if it is in Mahout, I can contribute if it is not), and produce
> recommendations using these item clusters (profiles). When a new item
> arrives, I find its nearest profile, and recommend the item where its
> belonging profile is recommended to. Do you find this approach good enough?
>
> If you have a theoretical idea, could you please point me to some related
> papers?
>
> (As an MSc student, I can implement this as a Google Summer of Code project,
> with your mentoring.)
>
> Thanks in advance
>