You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by James Donnelly <ja...@gmail.com> on 2015/06/19 11:35:11 UTC

Realtime update of similarity matrices

Hi,

First of all, a big thanks to Ted and Pat, and all the authors and
developers around Mahout.

I'm putting together an eCommerce recommendation framework, and have a
couple of questions from using the latest tools in Mahout 1.0.

I've seen it hinted by Pat that real-time updates (incremental learning)
are made possible with the latest Mahout tools here:

http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/

But once I have gone through the first phase of data processing, I'm not
clear on the basic direction for maintaining the generated data, e.g with
added products and incremental user behaviour data.

The only way I can see is to update my input data,  then re-run the entire
process of generating the similarity matrices using the itemSimilarity and
rowSImilarity jobs.  Is there a better way?

James

Re: Realtime update of similarity matrices

Posted by Pat Ferrel <pa...@occamsmachete.com>.
It sounds like you have an SaaS recommender service and are trying to find a way to use one client’s data for another client? Do your clients object to this?

In any case correlating items by content seems dubious unless the items are from very similar catalogs. You are trying to account for not having  unified item ids? I guess if you had two booksellers you might find similar descriptions since they come from the publisher but this seems like a long shot.

Content similarity is a longer discussion.

On Jun 22, 2015, at 9:00 AM, James Donnelly <ja...@gmail.com> wrote:

Ted, thanks for the video - enjoyable and insightful.

Gustavo, a good read, and a reminder of how far I have to go.  More maths
later - fun!

Pat, I need to read more and take my time understanding how cut-offs in LLR
derived co-occurence can be exploited in practice.  I accept that useful
real-time model updates are an edge case, but I may have to face edge cases.

I mentioned the framework I'm putting together - I didn't mention that
we're a SAAS business.  The product will serve multiple use cases.

The cold start capabilities of the multi-modal approach are appealing.  I
can see content recommendations filling the gap while we build the
user-item model - this won't work for all product types of course.

There are clients whose 'products' are fairly short lived where the initial
burst of user-item interactions would definitely be useful.  I take your
point that small increment sets might not impact the model in most cases.

My take-out from the responses so far is that the real time question can
wait until phase n of the project without sacrificing much value.   I'm
looking forward to learning what is possible - I see what you are saying
about the mutable vectors.

The great thing now for me is that I can do an end to end proof of concept
mostly by doing framework plumbing.  Maybe I'll look into doing multiple
cross-coocurrence indicators in once pass via the ItemSimilarityDriver, but
once we get the basics functioning, we'll probably be looking to engage a
Ted or a Pat if we can afford them :D

There is one final challenge for today I have not figured out though.
Let's say I have a new client (client #2), who sells shoes.  Let's say I
have an existing client (client #1), for whom we have captured a million
user-view/purchase interactions.  How can I recommend to client #2 based on
the model built from client #1?

The items in their respective inventories are similar by content, but not
identical.  So I need to map the content similarities across the product
data sets, then via that mapping, apply pseudo-collaborative filtering to
client #2's customers.

Thoughts?

Many thanks for your time once again.



On 22 June 2015 at 01:32, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Actually Mahout’s item and row similarity calculate the cooccurrence and
> cross-cooccurrence matrices, a search engine preforms the knn calc to
> return an ordered list of recs. The search query is user history the search
> engine calculates the most similar items from the cooccurrence matrix and
> cross-cooccurrence matrices by keeping them in different fields. This means
> there is only one query across several matrices. Solr and Elasticsearch are
> well know for speed and scalability in serving these queries.
> 
> In a hypothetical  incremental model we might use the search engine as
> matrix storage since an incremental update to the matrix would be indexed
> in realtime by the engine. The update method Ted mentions is relatively
> simple and only requires that the cooccurrence matrices be mutable and two
> mutable vectors be kept in memory (item/column and user/row interaction
> counts).
> 
> On Jun 19, 2015, at 6:47 PM, Gustavo Frederico <
> gustavo.frederico@thinkwrap.com> wrote:
> 
> James,
> 
>  From my days at the university I remember reinforcement learning (
> https://en.wikipedia.org/wiki/Reinforcement_learning )
> I suspect reinforcement learning is interesting to explore in the problem
> of e-commerce recommendation. My academic stuff is really rusted, but it's
> one of the few models that represent well the synchronous/asynchronous
> problem that we see in e-commerce systems...
> The models I'm seeing with Mahout + Solr  (by MapR et alli) have Solr do
> the work to calculate the co-occurrence indicators. So the fact Solr is
> indexing this 'from scratch' during offline learning 'throws the whole
> model into the garbage soon' and doesn't leave room for the
> optimization/reward step of reinforcement learning. I don't know if someone
> could go on the theoretical side and tell us if perhaps there's a 'mapping'
> between the reinforcement learning model and the traditional off-line
> training + on-line testing. Maybe there's a way to shorten the Solr
> indexing cycle, but I'm not sure how to 'inject' the reward in the model...
> just some thoughts...
> 
> cheers
> 
> Gustavo
> 
> 
> 
> On Fri, Jun 19, 2015 at 5:35 AM, James Donnelly <ja...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> First of all, a big thanks to Ted and Pat, and all the authors and
>> developers around Mahout.
>> 
>> I'm putting together an eCommerce recommendation framework, and have a
>> couple of questions from using the latest tools in Mahout 1.0.
>> 
>> I've seen it hinted by Pat that real-time updates (incremental learning)
>> are made possible with the latest Mahout tools here:
>> 
>> 
>> 
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
>> 
>> But once I have gone through the first phase of data processing, I'm not
>> clear on the basic direction for maintaining the generated data, e.g with
>> added products and incremental user behaviour data.
>> 
>> The only way I can see is to update my input data,  then re-run the
> entire
>> process of generating the similarity matrices using the itemSimilarity
> and
>> rowSImilarity jobs.  Is there a better way?
>> 
>> James
>> 
> 
> 


Re: Realtime update of similarity matrices

Posted by Ted Dunning <te...@gmail.com>.
James,

This isn't an answer to your last question ...

You have an excellent summary there.  The only thing that you may have
missed is that using cooccurrence/search-based recommendations allows you
to improve results precisely because it gets you out of the business of
tweaking algorithms and into the business of determining which data works
better for your particular situation.  Algorithmic tweaks have very, very
limited upside.  Getting better data has >100% potential for improvement.
It is very important to get your recs team out of the first low-value
activity of tweaking algorithms and into the second high-value activity of
collecting and evaluating data.


On Mon, Jun 22, 2015 at 9:00 AM, James Donnelly <ja...@gmail.com>
wrote:

> Ted, thanks for the video - enjoyable and insightful.
>
> Gustavo, a good read, and a reminder of how far I have to go.  More maths
> later - fun!
>
> Pat, I need to read more and take my time understanding how cut-offs in LLR
> derived co-occurence can be exploited in practice.  I accept that useful
> real-time model updates are an edge case, but I may have to face edge
> cases.
>
> I mentioned the framework I'm putting together - I didn't mention that
> we're a SAAS business.  The product will serve multiple use cases.
>
> The cold start capabilities of the multi-modal approach are appealing.  I
> can see content recommendations filling the gap while we build the
> user-item model - this won't work for all product types of course.
>
> There are clients whose 'products' are fairly short lived where the initial
> burst of user-item interactions would definitely be useful.  I take your
> point that small increment sets might not impact the model in most cases.
>
> My take-out from the responses so far is that the real time question can
> wait until phase n of the project without sacrificing much value.   I'm
> looking forward to learning what is possible - I see what you are saying
> about the mutable vectors.
>
> The great thing now for me is that I can do an end to end proof of concept
> mostly by doing framework plumbing.  Maybe I'll look into doing multiple
> cross-coocurrence indicators in once pass via the ItemSimilarityDriver, but
> once we get the basics functioning, we'll probably be looking to engage a
> Ted or a Pat if we can afford them :D
>
> There is one final challenge for today I have not figured out though.
> Let's say I have a new client (client #2), who sells shoes.  Let's say I
> have an existing client (client #1), for whom we have captured a million
> user-view/purchase interactions.  How can I recommend to client #2 based on
> the model built from client #1?
>
> The items in their respective inventories are similar by content, but not
> identical.  So I need to map the content similarities across the product
> data sets, then via that mapping, apply pseudo-collaborative filtering to
> client #2's customers.
>
> Thoughts?
>
> Many thanks for your time once again.
>
>
>
> On 22 June 2015 at 01:32, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > Actually Mahout’s item and row similarity calculate the cooccurrence and
> > cross-cooccurrence matrices, a search engine preforms the knn calc to
> > return an ordered list of recs. The search query is user history the
> search
> > engine calculates the most similar items from the cooccurrence matrix and
> > cross-cooccurrence matrices by keeping them in different fields. This
> means
> > there is only one query across several matrices. Solr and Elasticsearch
> are
> > well know for speed and scalability in serving these queries.
> >
> > In a hypothetical  incremental model we might use the search engine as
> > matrix storage since an incremental update to the matrix would be indexed
> > in realtime by the engine. The update method Ted mentions is relatively
> > simple and only requires that the cooccurrence matrices be mutable and
> two
> > mutable vectors be kept in memory (item/column and user/row interaction
> > counts).
> >
> > On Jun 19, 2015, at 6:47 PM, Gustavo Frederico <
> > gustavo.frederico@thinkwrap.com> wrote:
> >
> > James,
> >
> >   From my days at the university I remember reinforcement learning (
> > https://en.wikipedia.org/wiki/Reinforcement_learning )
> >  I suspect reinforcement learning is interesting to explore in the
> problem
> > of e-commerce recommendation. My academic stuff is really rusted, but
> it's
> > one of the few models that represent well the synchronous/asynchronous
> > problem that we see in e-commerce systems...
> >  The models I'm seeing with Mahout + Solr  (by MapR et alli) have Solr do
> > the work to calculate the co-occurrence indicators. So the fact Solr is
> > indexing this 'from scratch' during offline learning 'throws the whole
> > model into the garbage soon' and doesn't leave room for the
> > optimization/reward step of reinforcement learning. I don't know if
> someone
> > could go on the theoretical side and tell us if perhaps there's a
> 'mapping'
> > between the reinforcement learning model and the traditional off-line
> > training + on-line testing. Maybe there's a way to shorten the Solr
> > indexing cycle, but I'm not sure how to 'inject' the reward in the
> model...
> > just some thoughts...
> >
> > cheers
> >
> > Gustavo
> >
> >
> >
> > On Fri, Jun 19, 2015 at 5:35 AM, James Donnelly <
> jamesjdonnelly@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > First of all, a big thanks to Ted and Pat, and all the authors and
> > > developers around Mahout.
> > >
> > > I'm putting together an eCommerce recommendation framework, and have a
> > > couple of questions from using the latest tools in Mahout 1.0.
> > >
> > > I've seen it hinted by Pat that real-time updates (incremental
> learning)
> > > are made possible with the latest Mahout tools here:
> > >
> > >
> > >
> >
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> > >
> > > But once I have gone through the first phase of data processing, I'm
> not
> > > clear on the basic direction for maintaining the generated data, e.g
> with
> > > added products and incremental user behaviour data.
> > >
> > > The only way I can see is to update my input data,  then re-run the
> > entire
> > > process of generating the similarity matrices using the itemSimilarity
> > and
> > > rowSImilarity jobs.  Is there a better way?
> > >
> > > James
> > >
> >
> >
>

Re: Realtime update of similarity matrices

Posted by James Donnelly <ja...@gmail.com>.
Ted, thanks for the video - enjoyable and insightful.

Gustavo, a good read, and a reminder of how far I have to go.  More maths
later - fun!

Pat, I need to read more and take my time understanding how cut-offs in LLR
derived co-occurence can be exploited in practice.  I accept that useful
real-time model updates are an edge case, but I may have to face edge cases.

I mentioned the framework I'm putting together - I didn't mention that
we're a SAAS business.  The product will serve multiple use cases.

The cold start capabilities of the multi-modal approach are appealing.  I
can see content recommendations filling the gap while we build the
user-item model - this won't work for all product types of course.

There are clients whose 'products' are fairly short lived where the initial
burst of user-item interactions would definitely be useful.  I take your
point that small increment sets might not impact the model in most cases.

My take-out from the responses so far is that the real time question can
wait until phase n of the project without sacrificing much value.   I'm
looking forward to learning what is possible - I see what you are saying
about the mutable vectors.

The great thing now for me is that I can do an end to end proof of concept
mostly by doing framework plumbing.  Maybe I'll look into doing multiple
cross-coocurrence indicators in once pass via the ItemSimilarityDriver, but
once we get the basics functioning, we'll probably be looking to engage a
Ted or a Pat if we can afford them :D

There is one final challenge for today I have not figured out though.
Let's say I have a new client (client #2), who sells shoes.  Let's say I
have an existing client (client #1), for whom we have captured a million
user-view/purchase interactions.  How can I recommend to client #2 based on
the model built from client #1?

The items in their respective inventories are similar by content, but not
identical.  So I need to map the content similarities across the product
data sets, then via that mapping, apply pseudo-collaborative filtering to
client #2's customers.

Thoughts?

Many thanks for your time once again.



On 22 June 2015 at 01:32, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Actually Mahout’s item and row similarity calculate the cooccurrence and
> cross-cooccurrence matrices, a search engine preforms the knn calc to
> return an ordered list of recs. The search query is user history the search
> engine calculates the most similar items from the cooccurrence matrix and
> cross-cooccurrence matrices by keeping them in different fields. This means
> there is only one query across several matrices. Solr and Elasticsearch are
> well know for speed and scalability in serving these queries.
>
> In a hypothetical  incremental model we might use the search engine as
> matrix storage since an incremental update to the matrix would be indexed
> in realtime by the engine. The update method Ted mentions is relatively
> simple and only requires that the cooccurrence matrices be mutable and two
> mutable vectors be kept in memory (item/column and user/row interaction
> counts).
>
> On Jun 19, 2015, at 6:47 PM, Gustavo Frederico <
> gustavo.frederico@thinkwrap.com> wrote:
>
> James,
>
>   From my days at the university I remember reinforcement learning (
> https://en.wikipedia.org/wiki/Reinforcement_learning )
>  I suspect reinforcement learning is interesting to explore in the problem
> of e-commerce recommendation. My academic stuff is really rusted, but it's
> one of the few models that represent well the synchronous/asynchronous
> problem that we see in e-commerce systems...
>  The models I'm seeing with Mahout + Solr  (by MapR et alli) have Solr do
> the work to calculate the co-occurrence indicators. So the fact Solr is
> indexing this 'from scratch' during offline learning 'throws the whole
> model into the garbage soon' and doesn't leave room for the
> optimization/reward step of reinforcement learning. I don't know if someone
> could go on the theoretical side and tell us if perhaps there's a 'mapping'
> between the reinforcement learning model and the traditional off-line
> training + on-line testing. Maybe there's a way to shorten the Solr
> indexing cycle, but I'm not sure how to 'inject' the reward in the model...
> just some thoughts...
>
> cheers
>
> Gustavo
>
>
>
> On Fri, Jun 19, 2015 at 5:35 AM, James Donnelly <ja...@gmail.com>
> wrote:
>
> > Hi,
> >
> > First of all, a big thanks to Ted and Pat, and all the authors and
> > developers around Mahout.
> >
> > I'm putting together an eCommerce recommendation framework, and have a
> > couple of questions from using the latest tools in Mahout 1.0.
> >
> > I've seen it hinted by Pat that real-time updates (incremental learning)
> > are made possible with the latest Mahout tools here:
> >
> >
> >
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> >
> > But once I have gone through the first phase of data processing, I'm not
> > clear on the basic direction for maintaining the generated data, e.g with
> > added products and incremental user behaviour data.
> >
> > The only way I can see is to update my input data,  then re-run the
> entire
> > process of generating the similarity matrices using the itemSimilarity
> and
> > rowSImilarity jobs.  Is there a better way?
> >
> > James
> >
>
>

Re: Realtime update of similarity matrices

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Actually Mahout’s item and row similarity calculate the cooccurrence and cross-cooccurrence matrices, a search engine preforms the knn calc to return an ordered list of recs. The search query is user history the search engine calculates the most similar items from the cooccurrence matrix and cross-cooccurrence matrices by keeping them in different fields. This means there is only one query across several matrices. Solr and Elasticsearch are well know for speed and scalability in serving these queries.

In a hypothetical  incremental model we might use the search engine as matrix storage since an incremental update to the matrix would be indexed in realtime by the engine. The update method Ted mentions is relatively simple and only requires that the cooccurrence matrices be mutable and two mutable vectors be kept in memory (item/column and user/row interaction counts). 

On Jun 19, 2015, at 6:47 PM, Gustavo Frederico <gu...@thinkwrap.com> wrote:

James,

  From my days at the university I remember reinforcement learning (
https://en.wikipedia.org/wiki/Reinforcement_learning )
 I suspect reinforcement learning is interesting to explore in the problem
of e-commerce recommendation. My academic stuff is really rusted, but it's
one of the few models that represent well the synchronous/asynchronous
problem that we see in e-commerce systems...
 The models I'm seeing with Mahout + Solr  (by MapR et alli) have Solr do
the work to calculate the co-occurrence indicators. So the fact Solr is
indexing this 'from scratch' during offline learning 'throws the whole
model into the garbage soon' and doesn't leave room for the
optimization/reward step of reinforcement learning. I don't know if someone
could go on the theoretical side and tell us if perhaps there's a 'mapping'
between the reinforcement learning model and the traditional off-line
training + on-line testing. Maybe there's a way to shorten the Solr
indexing cycle, but I'm not sure how to 'inject' the reward in the model...
just some thoughts...

cheers

Gustavo



On Fri, Jun 19, 2015 at 5:35 AM, James Donnelly <ja...@gmail.com>
wrote:

> Hi,
> 
> First of all, a big thanks to Ted and Pat, and all the authors and
> developers around Mahout.
> 
> I'm putting together an eCommerce recommendation framework, and have a
> couple of questions from using the latest tools in Mahout 1.0.
> 
> I've seen it hinted by Pat that real-time updates (incremental learning)
> are made possible with the latest Mahout tools here:
> 
> 
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> 
> But once I have gone through the first phase of data processing, I'm not
> clear on the basic direction for maintaining the generated data, e.g with
> added products and incremental user behaviour data.
> 
> The only way I can see is to update my input data,  then re-run the entire
> process of generating the similarity matrices using the itemSimilarity and
> rowSImilarity jobs.  Is there a better way?
> 
> James
> 


Re: Realtime update of similarity matrices

Posted by Gustavo Frederico <gu...@thinkwrap.com>.
James,

   From my days at the university I remember reinforcement learning (
https://en.wikipedia.org/wiki/Reinforcement_learning )
  I suspect reinforcement learning is interesting to explore in the problem
of e-commerce recommendation. My academic stuff is really rusted, but it's
one of the few models that represent well the synchronous/asynchronous
problem that we see in e-commerce systems...
  The models I'm seeing with Mahout + Solr  (by MapR et alli) have Solr do
the work to calculate the co-occurrence indicators. So the fact Solr is
indexing this 'from scratch' during offline learning 'throws the whole
model into the garbage soon' and doesn't leave room for the
optimization/reward step of reinforcement learning. I don't know if someone
could go on the theoretical side and tell us if perhaps there's a 'mapping'
between the reinforcement learning model and the traditional off-line
training + on-line testing. Maybe there's a way to shorten the Solr
indexing cycle, but I'm not sure how to 'inject' the reward in the model...
just some thoughts...

cheers

Gustavo



On Fri, Jun 19, 2015 at 5:35 AM, James Donnelly <ja...@gmail.com>
wrote:

> Hi,
>
> First of all, a big thanks to Ted and Pat, and all the authors and
> developers around Mahout.
>
> I'm putting together an eCommerce recommendation framework, and have a
> couple of questions from using the latest tools in Mahout 1.0.
>
> I've seen it hinted by Pat that real-time updates (incremental learning)
> are made possible with the latest Mahout tools here:
>
>
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
>
> But once I have gone through the first phase of data processing, I'm not
> clear on the basic direction for maintaining the generated data, e.g with
> added products and incremental user behaviour data.
>
> The only way I can see is to update my input data,  then re-run the entire
> process of generating the similarity matrices using the itemSimilarity and
> rowSImilarity jobs.  Is there a better way?
>
> James
>

Re: Realtime update of similarity matrices

Posted by Pat Ferrel <pa...@occamsmachete.com>.
It is possible to do but not implemented anywhere afaik. Streaming and online/incremental model calcs are different. Plain Streaming recalcs the model on a moving time window but does so very often, online/incremental treats the model as a mutable thing and modifies it in place. As you can imagine they require very different methods. Ted’s reference points out that the internal LLR weighted cooccurrence calc is possible to do online because there is a # of cooccurrences cutoff that means many new interactions are not going to affect the model and LLR is a very simple calc not involving the entire row or column vectors only their non-zero element counts, which are easy to keep in memory (one vector each)

It’s relatively simple to set up Mahout’s item and row similarity to take streams and recalc at rapid intervals. I’ve done this with Kafka to Spark streaming input. This uses an entire time window’s worth of data and so is not incremental but since the calc is fast and O(n) can be scaled with size of Spark cluster. The cooccurrence and cross-cooccurrence calc can be done on the public epinions data on my laptop in 12 minutes. This is a smallish dataset.

But may I ask why you want online/incremental? There are only a few edge cases that benefit from this and as Ted points out there may be very few interactions that will modify the model at all.

The reasons to update a model are:
1) new items are added. Actually only when new items have some number of interactions. How often is your item collection changing? If you have a very popular newspaper and the items changed by the minute this might be a case where very rapid model updates would benefit you.
2) the characteristics of interactions change very rapidly. So this is where users are changing preferences very often. I have never personally run into this case but imagine there are examples in social media.

The Multimodal recommender can handle new users that have some usage history but were not used in the model calc so new users are not a case where you need incremental model updates.


On Jun 19, 2015, at 3:46 PM, Ted Dunning <te...@gmail.com> wrote:

The standard approach is to re-run the off-line learning.

It is possible, though not yet supported in Mahout tools, to do real-time
updates.

See here for some details:
https://www.mapr.com/resources/videos/fully-real-time-recommendation-%E2%80%93-ted-dunning-sf-data-mining



On Fri, Jun 19, 2015 at 2:35 AM, James Donnelly <ja...@gmail.com>
wrote:

> Hi,
> 
> First of all, a big thanks to Ted and Pat, and all the authors and
> developers around Mahout.
> 
> I'm putting together an eCommerce recommendation framework, and have a
> couple of questions from using the latest tools in Mahout 1.0.
> 
> I've seen it hinted by Pat that real-time updates (incremental learning)
> are made possible with the latest Mahout tools here:
> 
> 
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> 
> But once I have gone through the first phase of data processing, I'm not
> clear on the basic direction for maintaining the generated data, e.g with
> added products and incremental user behaviour data.
> 
> The only way I can see is to update my input data,  then re-run the entire
> process of generating the similarity matrices using the itemSimilarity and
> rowSImilarity jobs.  Is there a better way?
> 
> James
> 


Re: Realtime update of similarity matrices

Posted by Ted Dunning <te...@gmail.com>.
The standard approach is to re-run the off-line learning.

It is possible, though not yet supported in Mahout tools, to do real-time
updates.

See here for some details:
https://www.mapr.com/resources/videos/fully-real-time-recommendation-%E2%80%93-ted-dunning-sf-data-mining



On Fri, Jun 19, 2015 at 2:35 AM, James Donnelly <ja...@gmail.com>
wrote:

> Hi,
>
> First of all, a big thanks to Ted and Pat, and all the authors and
> developers around Mahout.
>
> I'm putting together an eCommerce recommendation framework, and have a
> couple of questions from using the latest tools in Mahout 1.0.
>
> I've seen it hinted by Pat that real-time updates (incremental learning)
> are made possible with the latest Mahout tools here:
>
>
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
>
> But once I have gone through the first phase of data processing, I'm not
> clear on the basic direction for maintaining the generated data, e.g with
> added products and incremental user behaviour data.
>
> The only way I can see is to update my input data,  then re-run the entire
> process of generating the similarity matrices using the itemSimilarity and
> rowSImilarity jobs.  Is there a better way?
>
> James
>