You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2013/09/27 14:48:41 UTC

0.9?

Anyone interested in thinking about 0.9 in the early Nov. time frame?

-Grant

Re: 0.9?

Posted by Suneel Marthi <su...@yahoo.com>.

Yep, I was planning on using Lucene's FST.




________________________________
 From: Grant Ingersoll <gs...@apache.org>
To: dev@mahout.apache.org; Suneel Marthi <su...@yahoo.com> 
Sent: Saturday, September 28, 2013 8:41 AM
Subject: Re: 0.9?
 




On Sep 27, 2013, at 9:07 AM, Suneel Marthi <su...@yahoo.com> wrote:

I was gonna bring this up myself next week (and was chatting with Isabel about it today morning).
>
>I was thinking of the following for 0.9:-
>
>1. We have already removed the algorithms that have been marked as deprecated in 0.8
>2.  Bugs that have been fixed since 0.8.
>3.  New Features in 0.9 could include :-
>    a) New Multilayer Perceptron that Yexi had contributed recently and is presently pending review (don't know the JIRA# top of my head).  
>    b)  Using Finite State Transducers as a dictionary type. I had opened a Jira for this and an work on it.
> 
>

Are you using Lucene's FSTs for this?

Rest sounds good.


Anything else others would like to add???
>
>Grant, could we have a hangout the week of Oct 7 :) ??
>

I can't that week, but probably the following.


>
>
>
>________________________________
>From: Grant Ingersoll <gs...@apache.org>
>To: "dev@mahout.apache.org" <de...@mahout.apache.org> 
>Sent: Friday, September 27, 2013 8:48 AM
>Subject: 0.9?
>
>
>Anyone interested in thinking about 0.9 in the early Nov. time frame?
>
>-Grant

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: 0.9?

Posted by Tom Griffin <t....@ieee.org>.

unsubscribe

--------------------------------------------------
*Tom Griffin*
*Director, Innovation*
*
*
*Office: *732-562-6531
*Mobile: *201-259-8860
*Email:* t.p.griffin@ieee.org




On Sat, Sep 28, 2013 at 8:41 AM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Sep 27, 2013, at 9:07 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> > I was gonna bring this up myself next week (and was chatting with Isabel
> about it today morning).
> >
> > I was thinking of the following for 0.9:-
> >
> > 1. We have already removed the algorithms that have been marked as
> deprecated in 0.8
> > 2.  Bugs that have been fixed since 0.8.
> > 3.  New Features in 0.9 could include :-
> >     a) New Multilayer Perceptron that Yexi had contributed recently and
> is presently pending review (don't know the JIRA# top of my head).
> >     b)  Using Finite State Transducers as a dictionary type. I had
> opened a Jira for this and an work on it.
> >
>
> Are you using Lucene's FSTs for this?
>
> Rest sounds good.
>
>
> > Anything else others would like to add???
> >
> > Grant, could we have a hangout the week of Oct 7 :) ??
>
> I can't that week, but probably the following.
>
> >
> >
> >
> >
> > ________________________________
> > From: Grant Ingersoll <gs...@apache.org>
> > To: "dev@mahout.apache.org" <de...@mahout.apache.org>
> > Sent: Friday, September 27, 2013 8:48 AM
> > Subject: 0.9?
> >
> >
> > Anyone interested in thinking about 0.9 in the early Nov. time frame?
> >
> > -Grant
>
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>
>

Re: 0.9?

Posted by Grant Ingersoll <gs...@apache.org>.

On Sep 27, 2013, at 9:07 AM, Suneel Marthi <su...@yahoo.com> wrote:

> I was gonna bring this up myself next week (and was chatting with Isabel about it today morning).
> 
> I was thinking of the following for 0.9:-
> 
> 1. We have already removed the algorithms that have been marked as deprecated in 0.8
> 2.  Bugs that have been fixed since 0.8.
> 3.  New Features in 0.9 could include :-
>     a) New Multilayer Perceptron that Yexi had contributed recently and is presently pending review (don't know the JIRA# top of my head).  
>     b)  Using Finite State Transducers as a dictionary type. I had opened a Jira for this and an work on it.
>  

Are you using Lucene's FSTs for this?

Rest sounds good.


> Anything else others would like to add???
> 
> Grant, could we have a hangout the week of Oct 7 :) ??

I can't that week, but probably the following.

> 
> 
> 
> 
> ________________________________
> From: Grant Ingersoll <gs...@apache.org>
> To: "dev@mahout.apache.org" <de...@mahout.apache.org> 
> Sent: Friday, September 27, 2013 8:48 AM
> Subject: 0.9?
> 
> 
> Anyone interested in thinking about 0.9 in the early Nov. time frame?
> 
> -Grant

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: 0.9?

Posted by Suneel Marthi <su...@yahoo.com>.

I was gonna bring this up myself next week (and was chatting with Isabel about it today morning).

I was thinking of the following for 0.9:-

1. We have already removed the algorithms that have been marked as deprecated in 0.8
2.  Bugs that have been fixed since 0.8.
3.  New Features in 0.9 could include :-
    a) New Multilayer Perceptron that Yexi had contributed recently and is presently pending review (don't know the JIRA# top of my head).  
    b)  Using Finite State Transducers as a dictionary type. I had opened a Jira for this and an work on it.
 
Anything else others would like to add???

Grant, could we have a hangout the week of Oct 7 :) ??




________________________________
 From: Grant Ingersoll <gs...@apache.org>
To: "dev@mahout.apache.org" <de...@mahout.apache.org> 
Sent: Friday, September 27, 2013 8:48 AM
Subject: 0.9?
 

Anyone interested in thinking about 0.9 in the early Nov. time frame?

-Grant

Re: 0.9?

Posted by Isabel Drost-Fromm <is...@apache.org>.

On Monday, September 30, 2013 11:21:30 AM Grant Ingersoll wrote:
> This sounds good to the extent we can get them done.  Do you have JIRA
> issues for any of these open?  November isn't hard and fast for 0.9, but I
> suspect it will be January if we push things out.

We could also think about including what seems feasable until November in 0.9 
and push the rest for the 1.0 release.

The advantage I see with getting 0.9 out the door quickly is that most likely 
this will give us more feedback on whether we are on the right track 
concerning deletions and cleanups. Understandably several users seem to switch 
to newer code versions only once they are officially released...

Isabel

Re: 0.9?

Posted by Ted Dunning <te...@gmail.com>.

Humble beginnings are excellent for building involvement.




On Mon, Sep 30, 2013 at 10:35 AM, Dmitriy Lyubimov <dl...@gmail.com>wrote:

> On Sat, Sep 28, 2013 at 10:59 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > The one large-ish feature that I think would find general use would be a
> > high performance classifier trainer.
> >
> > Flor cleanup sort of thing it would be good to fully integrate the
> > streaming k-means into the normal clustering commands while revamping the
> > command line API.
> >
> > Dmitriy's recent scala work would help quite a bit before 1.0. Not sure
> it
> > can make 0.9.
> >
>
> Yeah sorry i've been distracted, I have a fairly sizeable focus on a data
> mining project at the office and i have a baby on the way ( in 2 weeks max)
> . Not sure i can devote as much as i really wanted but my company signed
> off on at least Spark DRM for Mahout.
>
> Another thing here is that Spark 0.9 is going to have GraphX which is
> supposed to be smarter about partitioning of the graphs (and perhaps skewed
> graphs) so I'd be eager to rewrite matrix algorithms for GraphX rather than
> using current limited Bagel capability. Perhaps  it makes sense to commit
> current Bagel DRM humble beginnings so more people perhaps could give me a
> hand on this (Perhaps Nick P. ?)
>
>
>
>
> > For recommendations, I think that the demo system that pat started with
> > the elaborations by Ellen an Tim would be very good to have.
> >
> > I would be happy to collaborate with somebody on these but am not at all
> > likely to have time to actually do them end to end.
> >
> > Sent from my iPhone
> >
> > On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:
> >
> > > Moving closer to 1.0, removing cruft, etc.  Do we have any more major
> > features planned for 1.0?  I think we said during 0.8 that we would try
> to
> > follow pretty quickly w/ another release.
> > >
> > > -Grant
> > >
> > > On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > >
> > >> Sounds right in principle but perhaps a bit soon.
> > >>
> > >> What would define the release?
> > >>
> > >> Sent from my iPhone
> > >>
> > >> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org>
> wrote:
> > >>
> > >>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
> > >>>
> > >>> -Grant
> > >
> > > --------------------------------------------
> > > Grant Ingersoll | @gsingers
> > > http://www.lucidworks.com
> > >
> > >
> > >
> > >
> > >
> >
>

Re: 0.9?

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

On Sat, Sep 28, 2013 at 10:59 AM, Ted Dunning <te...@gmail.com> wrote:

> The one large-ish feature that I think would find general use would be a
> high performance classifier trainer.
>
> Flor cleanup sort of thing it would be good to fully integrate the
> streaming k-means into the normal clustering commands while revamping the
> command line API.
>
> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it
> can make 0.9.
>

Yeah sorry i've been distracted, I have a fairly sizeable focus on a data
mining project at the office and i have a baby on the way ( in 2 weeks max)
. Not sure i can devote as much as i really wanted but my company signed
off on at least Spark DRM for Mahout.

Another thing here is that Spark 0.9 is going to have GraphX which is
supposed to be smarter about partitioning of the graphs (and perhaps skewed
graphs) so I'd be eager to rewrite matrix algorithms for GraphX rather than
using current limited Bagel capability. Perhaps  it makes sense to commit
current Bagel DRM humble beginnings so more people perhaps could give me a
hand on this (Perhaps Nick P. ?)

> For recommendations, I think that the demo system that pat started with
> the elaborations by Ellen an Tim would be very good to have.
>
> I would be happy to collaborate with somebody on these but am not at all
> likely to have time to actually do them end to end.
>
> Sent from my iPhone
>
> On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:
>
> > Moving closer to 1.0, removing cruft, etc.  Do we have any more major
> features planned for 1.0?  I think we said during 0.8 that we would try to
> follow pretty quickly w/ another release.
> >
> > -Grant
> >
> > On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
> >
> >> Sounds right in principle but perhaps a bit soon.
> >>
> >> What would define the release?
> >>
> >> Sent from my iPhone
> >>
> >> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
> >>
> >>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
> >>>
> >>> -Grant
> >
> > --------------------------------------------
> > Grant Ingersoll | @gsingers
> > http://www.lucidworks.com
> >
> >
> >
> >
> >
>

Re: Mahout Jira

Posted by Stevo Slavić <ss...@gmail.com>.

Works for me. Maybe you've been affected by some of the issues from recent
JIRA upgrade (see https://twitter.com/infrabot)
Consider clearing browser cache and cookies, maybe that will help.

Kind regards,
Stevo Slavic.

On Mon, Sep 30, 2013 at 5:27 PM, Suneel Marthi <su...@yahoo.com>wrote:

> JIRA's been down now for sometime, below link is not working.
>
>
> https://issues.apache.org/jira/browse/MAHOUT

Mahout Jira

Posted by Suneel Marthi <su...@yahoo.com>.

JIRA's been down now for sometime, below link is not working.


https://issues.apache.org/jira/browse/MAHOUT

Re: 0.9?

Posted by Ted Dunning <te...@gmail.com>.

I don't think any of those are JIRA-ized yet.  Dmitriy may have done some
of that for his part.




On Mon, Sep 30, 2013 at 8:21 AM, Grant Ingersoll <gs...@apache.org>wrote:

> Hi Ted,
>
> This sounds good to the extent we can get them done.  Do you have JIRA
> issues for any of these open?  November isn't hard and fast for 0.9, but I
> suspect it will be January if we push things out.
>
> -Grant
>
> On Sep 28, 2013, at 1:59 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > The one large-ish feature that I think would find general use would be a
> high performance classifier trainer.
> >
> > Flor cleanup sort of thing it would be good to fully integrate the
> streaming k-means into the normal clustering commands while revamping the
> command line API.
> >
> > Dmitriy's recent scala work would help quite a bit before 1.0. Not sure
> it can make 0.9.
> >
> > For recommendations, I think that the demo system that pat started with
> the elaborations by Ellen an Tim would be very good to have.
> >
> > I would be happy to collaborate with somebody on these but am not at all
> likely to have time to actually do them end to end.
> >
> > Sent from my iPhone
> >
> > On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:
> >
> >> Moving closer to 1.0, removing cruft, etc.  Do we have any more major
> features planned for 1.0?  I think we said during 0.8 that we would try to
> follow pretty quickly w/ another release.
> >>
> >> -Grant
> >>
> >> On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >>
> >>> Sounds right in principle but perhaps a bit soon.
> >>>
> >>> What would define the release?
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
> >>>
> >>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
> >>>>
> >>>> -Grant
> >>
> >> --------------------------------------------
> >> Grant Ingersoll | @gsingers
> >> http://www.lucidworks.com
> >>
> >>
> >>
> >>
> >>
>
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>
>

Re: 0.9?

Posted by Grant Ingersoll <gs...@apache.org>.

Hi Ted,

This sounds good to the extent we can get them done.  Do you have JIRA issues for any of these open?  November isn't hard and fast for 0.9, but I suspect it will be January if we push things out.

-Grant

On Sep 28, 2013, at 1:59 PM, Ted Dunning <te...@gmail.com> wrote:

> The one large-ish feature that I think would find general use would be a high performance classifier trainer.  
> 
> Flor cleanup sort of thing it would be good to fully integrate the streaming k-means into the normal clustering commands while revamping the command line API.  
> 
> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can make 0.9. 
> 
> For recommendations, I think that the demo system that pat started with the elaborations by Ellen an Tim would be very good to have. 
> 
> I would be happy to collaborate with somebody on these but am not at all likely to have time to actually do them end to end. 
> 
> Sent from my iPhone
> 
> On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:
> 
>> Moving closer to 1.0, removing cruft, etc.  Do we have any more major features planned for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/ another release.
>> 
>> -Grant
>> 
>> On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
>> 
>>> Sounds right in principle but perhaps a bit soon.  
>>> 
>>> What would define the release?
>>> 
>>> Sent from my iPhone
>>> 
>>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
>>> 
>>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>> 
>>>> -Grant
>> 
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>> 

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: Solr-recommender

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Oct 9, 2013 at 12:54 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> On 10/9/13 3:08 PM, Pat Ferrel wrote:
>
>> Solr uses cosine similarity for it's queries. The implementation on
>> github uses Mahout LLR for calculating the item-item similarity matrix but
>> when you do the more-like-this query at runtime Solr uses cosine. This can
>> be fixed in Solr, not sure how much work.
>>
> It's not clear to me whether it's worth "fixing" this or not.  It would
> certainly complicate scoring calculations when mixing with traditional
> search terms.

I am pretty convinced it is not worth fixing.

This is particularly true because when you fix one count at 1 and take the
limiting form of LLR, you get something quite similar to LLR in any case.
 This means that Solr's current query is very close to what we want
theoretically ... certainly at least as close as theory is to practice.

Re: Solr-recommender

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Oct 9, 2013 at 2:07 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> 2) What you are doing is something else that I was calling a shopping-cart
> recommender. You are using the item-set in the current cart and finding
> similar, what, items? A different way to tackle this is to store all other
> shopping carts then use the current cart contents as a more-like-this query
> against past carts. This will give you items-purchased-together by other
> users. If you have enough carts it might give even better results. In any
> case they will be different.
>

Or the shopping cart can be used as a query for the current indicator
fields.  That gives you an item-based recommendation from shopping cart
contents.

I am not sure that the more-like-this query buys all that much versus an
ordinary query on the indicator fields.

Re: Solr-recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

1) Using the user history for the current user in a more-like-this query against the item-item similarity matrix will produce a user-history based recommendation. Simply fetching the item-item history row for a particular item will give you the item-similarity based recs with no account of user history. One could imagine a user-user similarity setup, but that's not what we did.

2) What you are doing is something else that I was calling a shopping-cart recommender. You are using the item-set in the current cart and finding similar, what, items? A different way to tackle this is to store all other shopping carts then use the current cart contents as a more-like-this query against past carts. This will give you items-purchased-together by other users. If you have enough carts it might give even better results. In any case they will be different.

https://github.com/pferrel/solr-recommender
But if you already have the item-item similarity matrix indexed this project wont add much. If you have purchase events and view-details events IDed by user you might try out the cross-recommender part. We've been searching for a data set to try this on. 

On Oct 9, 2013, at 12:54 PM, Michael Sokolov <ms...@safaribooksonline.com> wrote:

On 10/9/13 3:08 PM, Pat Ferrel wrote:
> Solr uses cosine similarity for it's queries. The implementation on github uses Mahout LLR for calculating the item-item similarity matrix but when you do the more-like-this query at runtime Solr uses cosine. This can be fixed in Solr, not sure how much work.
It's not clear to me whether it's worth "fixing" this or not.  It would certainly complicate scoring calculations when mixing with traditional search terms.
> 
> It sounds like you are doing item-item similarities for recommendations, not actually calculating user-history based recs, is that true?
Yes that's true so far.  Our recommender system has the ability to provide recs based on user history, but we have not deployed this in our app yet.  My plan was simply to query based on all the items in the user's "basket" - not sure that this would require a different back end?  We're not at the moment considering user-user similarity measures.
> 
> You bring up a point that we're finding. I'm not so sure we need or want a recommender query API that is separate from the Solr query API. What we are doing on our demo site is putting the output of the Solr-recommender where Solr can index it. Our web app framework then allows very flexible queries against Solr, using simple user history, producing the typical user-history based recommendations, or mixing/boosting based on metadata or contextual data. If we leave the recommender query API in Solr we get web app framework integration for free.
> 
> Another point is where the data is stored for the running system. If we allow Solr to index from any storage service that it supports then we also get free integration with most any web app framework and storage service. For the demo site we put the data in a DB and have Solr index it from there. We also store the user history and metadata there. This is supported by most web app frameworks out of the box. You could go a different route and use almost any storage system/file system/content format since Solr supports a wide variety.
> 
> Given a fully flexible Solr standard query and indexing scheme all you need do is tweak the query or data source a bit and you have an item-set recommender (shopping cart) or a contextual recommender (for example boost recs from a category) or a pure metadata/content based recommender.
> 
> If the query and storage is left to Solr+web app framework then the github version is complete if not done. Solr still needs LLR in the more-like-this queries. Term weights to encode strength scores would also be nice and I agree that both of these could use some work.
I would like to take a look at that version - I may have missed some discussion about it; would you posting a link please?
> 
> BTW lest we forget this does not imply the Solr-recommender is better than Myrrix or the Mahout-only recommenders. There needs to be some careful comparison of results. Michael, did you do offline or A/B tests during your implementation?

I ran some offline tests using our historical data, but I don't have a lot of faith in these beyond the fact they indicate we didn't make any obvious implementation errors.  We haven't attempted A/B testing yet since our site is so new, and we need to get a meaningful baseline going and sort out a lot of other more pressing issues on the site - recommendations are only one piece, albeit an important one.


Actually there was an interesting idea for an article posted recently about the difficulty of comparing results across systems in this field: http://www.docear.org/2013/09/23/research-paper-recommender-system-evaluation-a-quantitative-literature-survey/ but that's no excuse not to do better.  I'll certainly share when I know more :)

-Mike
> 
> On Oct 9, 2013, at 6:13 AM, Michael Sokolov <ms...@safaribooksonline.com> wrote:
> 
> Just to add a note of encouragement for the idea of better integration between Mahout and Solr:
> 
> On safariflow.com, we've recently converted our recommender, which computes similarity scores w/Mahout, from storing scores and running queries w/Postgres, to doing all that in Solr.  It's been a big improvement, both in terms of indexing speed, and more importantly, the flexibility of the queries we can write.  I believe that having scoring built in to the query engine is a key feature for recommendations.  More and more I am coming to believe that recommendation should just be considered as another facet of search: as one among many variables the system may take into account when presenting relevant information to the user.  In our system, we still clearly separate search from recommendations, and we probably will always do that to some extent, but I think we will start to blend the queries more so that there will be essentially a continuum of query options including more or less "user preference" data.
> 
> I think what I'm talking about may be a bit different than what Pat is describing (in implementation terms), since we do LLR calculations off-line in Mahout and then bulk load them into Solr.  We took one of Ted's earlier suggestions to heart, and simply ignored the actual numeric scores: we index the top N similar items for each item.  Later we may incorporate numeric scores in Solr as term weights.  If people are looking for things to do :) I think that would be a great software contribution that could spur this effort onward since it's difficult to accomplish right now given the Solr/Lucene indexing interfaces, but is already supported by the underlying data model and query engine.
> 
> 
> -Mike
> 
> On 10/2/13 12:19 PM, Pat Ferrel wrote:
>> Excellent. From Ellen's description the first Music use may be an implicit preference based recommender using synthetic  data? I'm quickly discovering how flexible Solr use is in many of these cases.
>> 
>> Here's another use you may have thought of:
>> 
>> Shopping cart recommenders, as goes the intuition, are best modeled as recommending from similar item-sets. If you store all shopping carts as your training data (play lists, watch lists etc.) then as a user adds things to their cart you query for the most similar past carts. Combine the results intelligently and you'll have an item set recommender. Solr is built to do this item-set similarity. We tried to do this for a ecom site with pure Mahout but the similarity calc in real time stymied us. We knew we'd need Solr but couldn't devote the resources to spin it up.
>> 
>> On the Con-side Solr has a lot of stuff you have to work around. It also does not have the ideal similarity measure for many uses (cosine is ok but llr would probably be better). You don't want stop word filtering, stemming, white space based tokenizing or n-grams. You would like explicit weighting. A good thing about Solr is how well it integrates with virtually any doc store independent of the indexing and query. A bit of an oval peg for a round hole.
>> 
>> It looks like the similarity code is replaceable if not pluggable. Much of the rest could be trimmed away by config or adherence to conventions I suspect. In the demo site I'm working on I've had to adopt some slightly hacky conventions that I'll describe some day.
>> 
>> On Oct 1, 2013, at 10:38 PM, Ted Dunning <te...@gmail.com> wrote:
>> 
>> 
>> Pat,
>> 
>> Ellen and some folks in Britain have been working with some data I produced from synthetic music fans.
>> 
>> 
>> On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> Hi Ellen,
>> 
>> 
>> On Oct 1, 2013, at 12:38 PM, Ted Dunning <te...@gmail.com> wrote:
>> 
>> 
>> As requested,
>> 
>> Pat, meet Ellen.
>> 
>> Ellen, meet Pat.
>> 
>> 
>> 
>> 
>> On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <pa...@gmail.com> wrote:
>> Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout version.
>> 
>> Things to note:
>> 1) The pure Mahout XRecommenderJob needs a cross-LLR or a cross-similairty job. Currently there is only cooccurrence for sparsification, which is far from optimal. This might take the form of a cross RSJ with two DRMs as input. I can't commit to this but would commit to adding it to the XRecommenderJob.
>> 2) output to Solr needs a lot of options implemented and tested. The hand-run test should be made into some junits. I'm slowly doing this.
>> 3) the Solr query API is unimplemented unless someone else is working on that. I'm building one in a demo site but it looks to me like a static recommender API is not going to be all that useful and maybe a document describing how to do it with the Solr query interface would be best, especially for a first step. The reasoning here is that it is so tempting to mix in metadata to the recommendation query that a static API is not so obvious. For the demo site the recommender API will be prototyped in a bunch of ways using models and controllers in Rails. If I'm the one to do the a Java Solr-recommender query API it will be after experimenting a bit.
>> 
>> Can someone introduce me to Ellen and Tim?
>> 
>> On Sep 28, 2013, at 10:59 AM, Ted Dunning <te...@gmail.com> wrote:
>> 
>> The one large-ish feature that I think would find general use would be a high performance classifier trainer.
>> 
>> Flor cleanup sort of thing it would be good to fully integrate the streaming k-means into the normal clustering commands while revamping the command line API.
>> 
>> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can make 0.9.
>> 
>> For recommendations, I think that the demo system that pat started with the elaborations by Ellen an Tim would be very good to have.
>> 
>> I would be happy to collaborate with somebody on these but am not at all likely to have time to actually do them end to end.
>> 
>> Sent from my iPhone
>> 
>> On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:
>> 
>>> Moving closer to 1.0, removing cruft, etc.  Do we have any more major features planned for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/ another release.
>>> 
>>> -Grant
>>> 
>>> On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
>>> 
>>>> Sounds right in principle but perhaps a bit soon.
>>>> 
>>>> What would define the release?
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
>>>> 
>>>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>>> 
>>>>> -Grant
>>> --------------------------------------------
>>> Grant Ingersoll | @gsingers
>>> http://www.lucidworks.com
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>

Feedback from presentation[was: Re: Solr-recommender]

Posted by Manuel Blechschmidt <Ma...@gmx.de>.

Hi Pat,
just as a side note. The solr-recommender was considered "some pretty hot shit" after my presentation.

It would be helpful if the github repository could directly be run without a custom build of Mahout 0.9-SNAPSHOT. This was not the case for me. I had to build Mahout from the SVN sources for myself.

It seams that the jenkins job publishes the mahout artifacts on the apache snapshots repository:
https://repository.apache.org/content/groups/snapshots/org/apache/mahout/

Further this repository is part of the solr-recommender pom.xml
https://github.com/pferrel/solr-recommender/blob/master/pom.xml

Seams that I made something wrong.

Thanks a lot
    Manuel


Am 26.10.2013 um 19:44 schrieb Pat Ferrel:

> Three areas need work:
> 1) The script with sample data that is in the project should be converted into a junit.
> 2) The current use of the Mahout RecommenderJob and various other bits of Mahout need to be updated to the latest 0.9 candidate (I'm working on this and expect to have it up-to-date before 0.9 is released)
> 3) An example demo site with Solr needs to be built. I'm doing one, some of Ted's group is doing another. Neither will be completely public I think so another example with sample data would be super helpful.
> 
> If you or someone else wants to help with #1 or #2 just fork the repo, let us know what you're doing, and create a push request when you're ready. It's under the Apache license like Mahout. If you want to do #3 I'll provide any help I can. Ping me if you'd like to discuss any of this.
> 
> I'll update the JIRA with progress on #2
> 
> ------------------------------
> 
> I've said it before but would love to hear what other's think; the rest of the implementation is simply integrating an app framework with Solr and finding some data. Therefore I'm proceeding with that.
> 
> What the github project does is prepare data, run the RecommenderJob and the XRecommenderJob (a cross recommender for multiple actions by users' that I built from Mahout DRM jobs) to create the item-item similarity matrix as well as the cross-action similarity matrix. The project then outputs to Solr digestible format CSV files with the originally ingested item and user ids. 
> 
> What I am doing for the demo site is:
> 1) Mining and updating a sample data set from RottenTomatoes.com from critics reviews. The data set is user id (critic), item id (video), preference (thumbs up or down) as well as a video catalog--working
> 2) Indexing the similarity matrix with Solr produced by the github project--working
> 3) Gather user preferences, I'm doing this with a Web UI--working but not deployed
> 4) Use user preferences as a more-like-this query against the output of the github project. This will produce realtime recommendations from the critic review training data--not implemented yet
> 
> The actual query and indexing are from code in the app framework. This fits with the architecture in Ted's docs but I've chosen a general purpose app framework for the demo, not Liquid Search. #3 of the areas needing work could use Liquid Search or some other app framework to make Solr result visible but you would need data.
> 
> I have a sample app in early stages at https://guide.finderbots.com/users/login uname: guest@finderbots.com, pword: find3rbots It currently caches poster images the first time they are fetched from RT so it will often be slow. It's showing item-item similarities. When you look at a video detail it shows thumbs of 10 similar videos. Since it uses critics for preferences the similar videos are somewhat surprising. 
> 
> Take it easy on the app, it's running in my bedroom closet.
> 
> On Oct 24, 2013, at 10:49 PM, Manuel Blechschmidt <Ma...@gmx.de> wrote:
> 
> Hi Dominik,
> the most important document is on Ted	Dunnings Google drive:
> 
> https://drive.google.com/folderview?id=0B7t2iY7e93hUNkJSbUtnd1kxUU0&usp=sharing
> 
> Design Document
> 
> Here is the corresponding JIRA entry:
> https://issues.apache.org/jira/browse/MAHOUT-1288
> 
> And here it Pats github repo:
> https://github.com/pferrel/solr-recommender
> 
> 
> Am 25.10.2013 um 01:55 schrieb Dominik Hübner:
> 
>> Having seen Ted presenting recommendation as search at the Munich Hadoop meetup, I remembered the new Solr recommender implemented by Pat. Are there any chances to contribute? I currently have same spare time, but could not find the related JIRA entry.
>> 
> 
> -- 
> Manuel Blechschmidt
> M.Sc. IT Systems Engineering
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
> 
> 

-- 
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Re: Solr-recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Three areas need work:
1) The script with sample data that is in the project should be converted into a junit.
2) The current use of the Mahout RecommenderJob and various other bits of Mahout need to be updated to the latest 0.9 candidate (I'm working on this and expect to have it up-to-date before 0.9 is released)
3) An example demo site with Solr needs to be built. I'm doing one, some of Ted's group is doing another. Neither will be completely public I think so another example with sample data would be super helpful.

If you or someone else wants to help with #1 or #2 just fork the repo, let us know what you're doing, and create a push request when you're ready. It's under the Apache license like Mahout. If you want to do #3 I'll provide any help I can. Ping me if you'd like to discuss any of this.

I'll update the JIRA with progress on #2

------------------------------

I've said it before but would love to hear what other's think; the rest of the implementation is simply integrating an app framework with Solr and finding some data. Therefore I'm proceeding with that.

What the github project does is prepare data, run the RecommenderJob and the XRecommenderJob (a cross recommender for multiple actions by users' that I built from Mahout DRM jobs) to create the item-item similarity matrix as well as the cross-action similarity matrix. The project then outputs to Solr digestible format CSV files with the originally ingested item and user ids.

What I am doing for the demo site is:
1) Mining and updating a sample data set from RottenTomatoes.com from critics reviews. The data set is user id (critic), item id (video), preference (thumbs up or down) as well as a video catalog--working
2) Indexing the similarity matrix with Solr produced by the github project--working
3) Gather user preferences, I'm doing this with a Web UI--working but not deployed
4) Use user preferences as a more-like-this query against the output of the github project. This will produce realtime recommendations from the critic review training data--not implemented yet

The actual query and indexing are from code in the app framework. This fits with the architecture in Ted's docs but I've chosen a general purpose app framework for the demo, not Liquid Search. #3 of the areas needing work could use Liquid Search or some other app framework to make Solr result visible but you would need data.

I have a sample app in early stages at https://guide.finderbots.com/users/login uname: guest@finderbots.com, pword: find3rbots It currently caches poster images the first time they are fetched from RT so it will often be slow. It's showing item-item similarities. When you look at a video detail it shows thumbs of 10 similar videos. Since it uses critics for preferences the similar videos are somewhat surprising.

Take it easy on the app, it's running in my bedroom closet.

On Oct 24, 2013, at 10:49 PM, Manuel Blechschmidt <Ma...@gmx.de> wrote:

Hi Dominik,
the most important document is on Ted Dunnings Google drive:

https://drive.google.com/folderview?id=0B7t2iY7e93hUNkJSbUtnd1kxUU0&usp=sharing

Design Document

Here is the corresponding JIRA entry:
https://issues.apache.org/jira/browse/MAHOUT-1288

And here it Pats github repo:
https://github.com/pferrel/solr-recommender

Am 25.10.2013 um 01:55 schrieb Dominik Hübner:

> Having seen Ted presenting recommendation as search at the Munich Hadoop meetup, I remembered the new Solr recommender implemented by Pat. Are there any chances to contribute? I currently have same spare time, but could not find the related JIRA entry.
>

--
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Re: Solr-recommender

Posted by Manuel Blechschmidt <Ma...@gmx.de>.

Hi Dominik,
the most important document is on Ted	Dunnings Google drive:

https://drive.google.com/folderview?id=0B7t2iY7e93hUNkJSbUtnd1kxUU0&usp=sharing

Design Document

Here is the corresponding JIRA entry:
https://issues.apache.org/jira/browse/MAHOUT-1288

And here it Pats github repo:
https://github.com/pferrel/solr-recommender


Am 25.10.2013 um 01:55 schrieb Dominik Hübner:

> Having seen Ted presenting recommendation as search at the Munich Hadoop meetup, I remembered the new Solr recommender implemented by Pat. Are there any chances to contribute? I currently have same spare time, but could not find the related JIRA entry.
> 

-- 
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Re: Solr-recommender

Posted by Dominik Hübner <co...@dhuebner.com>.

Having seen Ted presenting recommendation as search at the Munich Hadoop meetup, I remembered the new Solr recommender implemented by Pat. Are there any chances to contribute? I currently have same spare time, but could not find the related JIRA entry.

Re: Solr-recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

The issue of offline tests is often misunderstood I suspect. While I agree with Ted it might do to explain a bit.

For myself I'd say offline testing is a requirement but not for comparing two disparate recommenders. Companies like Amazon and Netflix, as well as others on record, have a workflow that includes offline testing and comparison against previous versions of their own code and their own gold data set. These comparisons can be quite useful, if only in pointing to otherwise obscure bugs. If they see a difference in two offline tests they ask, why? Then when they think they have an optimal solution they do A/B tests as challenger/champion competitions and it's these that are the only reliable measure of goodness.

I do agree that comparing two recommenders with offline tests is dubious at best, as the paper points out. But put yourself in the place of a company new to recommenders who has several to choose from. Maybe even versions of the same recommender with different tuning parameters. Do the offline tests with a standard set of your own data and pick the best to start with. What other choice do you have? Maybe flexibility or architecture trumps the offline tests, if not then using them is better than a random choice. Take this result with a grain of salt though and get ready to A/B test later challengers when or if you have time.

In the case of the Solr recommender it is extremely flexible and online (realtime results). These features for me trump any offline tests against alternatives. But the demo site will include offline Mahout recommendations for comparison, and in the unlikely event that it gets any traffic, will incorporate A/B tests.

On Oct 9, 2013, at 4:29 PM, Ted Dunning <te...@gmail.com> wrote:

On Wed, Oct 9, 2013 at 12:54 PM, Michael Sokolov <ms...@safaribooksonline.com> wrote:

BTW lest we forget this does not imply the Solr-recommender is better than Myrrix or the Mahout-only recommenders. There needs to be some careful comparison of results. Michael, did you do offline or A/B tests during your implementation?

I ran some offline tests using our historical data, but I don't have a lot of faith in these beyond the fact they indicate we didn't make any obvious implementation errors. We haven't attempted A/B testing yet since our site is so new, and we need to get a meaningful baseline going and sort out a lot of other more pressing issues on the site - recommendations are only one piece, albeit an important one.

Actually there was an interesting idea for an article posted recently about the difficulty of comparing results across systems in this field: http://www.docear.org/2013/09/23/research-paper-recommender-system-evaluation-a-quantitative-literature-survey/ but that's no excuse not to do better. I'll certainly share when I know more :)

I tend to be a pessimist with regard to off-line evaluation. It is fine to do, but if a system is anywhere near best, I think that it should be considered for A/B testing.

Re: Solr-recommender

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Oct 9, 2013 at 12:54 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

>
>> BTW lest we forget this does not imply the Solr-recommender is better
>> than Myrrix or the Mahout-only recommenders. There needs to be some careful
>> comparison of results. Michael, did you do offline or A/B tests during your
>> implementation?
>>
>
> I ran some offline tests using our historical data, but I don't have a lot
> of faith in these beyond the fact they indicate we didn't make any obvious
> implementation errors.  We haven't attempted A/B testing yet since our site
> is so new, and we need to get a meaningful baseline going and sort out a
> lot of other more pressing issues on the site - recommendations are only
> one piece, albeit an important one.
>
>
> Actually there was an interesting idea for an article posted recently
> about the difficulty of comparing results across systems in this field:
> http://www.docear.org/2013/09/**23/research-paper-recommender-**
> system-evaluation-a-**quantitative-literature-**survey/<http://www.docear.org/2013/09/23/research-paper-recommender-system-evaluation-a-quantitative-literature-survey/>but that's no excuse not to do better.  I'll certainly share when I know
> more :)


I tend to be a pessimist with regard to off-line evaluation.  It is fine to
do, but if a system is anywhere near best, I think that it should be
considered for A/B testing.

Re: Solr-recommender

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Oct 9, 2013 at 12:54 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> It sounds like you are doing item-item similarities for recommendations,
>> not actually calculating user-history based recs, is that true?
>>
> Yes that's true so far.  Our recommender system has the ability to provide
> recs based on user history, but we have not deployed this in our app yet.
>  My plan was simply to query based on all the items in the user's "basket"
> - not sure that this would require a different back end?  We're not at the
> moment considering user-user similarity measures.

The items in the basket really are kind of a history (a history of hte
items placed in the basket).

It is quite reasonable to use those as a query against indicator fields.

It would be nice to generate indicators (aka binarized item-item LLR
similarities) from a number of different actions such as view, dwell,
scroll, add-to-basket and see which ones or which combos give you the best
recommendation.

Re: Solr-recommender

Posted by Michael Sokolov <ms...@safaribooksonline.com>.

On 10/9/13 3:08 PM, Pat Ferrel wrote:
> Solr uses cosine similarity for it's queries. The implementation on github uses Mahout LLR for calculating the item-item similarity matrix but when you do the more-like-this query at runtime Solr uses cosine. This can be fixed in Solr, not sure how much work.
It's not clear to me whether it's worth "fixing" this or not.  It would 
certainly complicate scoring calculations when mixing with traditional 
search terms.
>
> It sounds like you are doing item-item similarities for recommendations, not actually calculating user-history based recs, is that true?
Yes that's true so far.  Our recommender system has the ability to 
provide recs based on user history, but we have not deployed this in our 
app yet.  My plan was simply to query based on all the items in the 
user's "basket" - not sure that this would require a different back 
end?  We're not at the moment considering user-user similarity measures.
>
> You bring up a point that we're finding. I'm not so sure we need or want a recommender query API that is separate from the Solr query API. What we are doing on our demo site is putting the output of the Solr-recommender where Solr can index it. Our web app framework then allows very flexible queries against Solr, using simple user history, producing the typical user-history based recommendations, or mixing/boosting based on metadata or contextual data. If we leave the recommender query API in Solr we get web app framework integration for free.
>
> Another point is where the data is stored for the running system. If we allow Solr to index from any storage service that it supports then we also get free integration with most any web app framework and storage service. For the demo site we put the data in a DB and have Solr index it from there. We also store the user history and metadata there. This is supported by most web app frameworks out of the box. You could go a different route and use almost any storage system/file system/content format since Solr supports a wide variety.
>
> Given a fully flexible Solr standard query and indexing scheme all you need do is tweak the query or data source a bit and you have an item-set recommender (shopping cart) or a contextual recommender (for example boost recs from a category) or a pure metadata/content based recommender.
>
> If the query and storage is left to Solr+web app framework then the github version is complete if not done. Solr still needs LLR in the more-like-this queries. Term weights to encode strength scores would also be nice and I agree that both of these could use some work.
I would like to take a look at that version - I may have missed some 
discussion about it; would you posting a link please?
>
> BTW lest we forget this does not imply the Solr-recommender is better than Myrrix or the Mahout-only recommenders. There needs to be some careful comparison of results. Michael, did you do offline or A/B tests during your implementation?

I ran some offline tests using our historical data, but I don't have a 
lot of faith in these beyond the fact they indicate we didn't make any 
obvious implementation errors.  We haven't attempted A/B testing yet 
since our site is so new, and we need to get a meaningful baseline going 
and sort out a lot of other more pressing issues on the site - 
recommendations are only one piece, albeit an important one.


Actually there was an interesting idea for an article posted recently 
about the difficulty of comparing results across systems in this field: 
http://www.docear.org/2013/09/23/research-paper-recommender-system-evaluation-a-quantitative-literature-survey/ 
but that's no excuse not to do better.  I'll certainly share when I know 
more :)

-Mike
>
> On Oct 9, 2013, at 6:13 AM, Michael Sokolov <ms...@safaribooksonline.com> wrote:
>
> Just to add a note of encouragement for the idea of better integration between Mahout and Solr:
>
> On safariflow.com, we've recently converted our recommender, which computes similarity scores w/Mahout, from storing scores and running queries w/Postgres, to doing all that in Solr.  It's been a big improvement, both in terms of indexing speed, and more importantly, the flexibility of the queries we can write.  I believe that having scoring built in to the query engine is a key feature for recommendations.  More and more I am coming to believe that recommendation should just be considered as another facet of search: as one among many variables the system may take into account when presenting relevant information to the user.  In our system, we still clearly separate search from recommendations, and we probably will always do that to some extent, but I think we will start to blend the queries more so that there will be essentially a continuum of query options including more or less "user preference" data.
>
> I think what I'm talking about may be a bit different than what Pat is describing (in implementation terms), since we do LLR calculations off-line in Mahout and then bulk load them into Solr.  We took one of Ted's earlier suggestions to heart, and simply ignored the actual numeric scores: we index the top N similar items for each item.  Later we may incorporate numeric scores in Solr as term weights.  If people are looking for things to do :) I think that would be a great software contribution that could spur this effort onward since it's difficult to accomplish right now given the Solr/Lucene indexing interfaces, but is already supported by the underlying data model and query engine.
>
>
> -Mike
>
> On 10/2/13 12:19 PM, Pat Ferrel wrote:
>> Excellent. From Ellen's description the first Music use may be an implicit preference based recommender using synthetic  data? I'm quickly discovering how flexible Solr use is in many of these cases.
>>
>> Here's another use you may have thought of:
>>
>> Shopping cart recommenders, as goes the intuition, are best modeled as recommending from similar item-sets. If you store all shopping carts as your training data (play lists, watch lists etc.) then as a user adds things to their cart you query for the most similar past carts. Combine the results intelligently and you'll have an item set recommender. Solr is built to do this item-set similarity. We tried to do this for a ecom site with pure Mahout but the similarity calc in real time stymied us. We knew we'd need Solr but couldn't devote the resources to spin it up.
>>
>> On the Con-side Solr has a lot of stuff you have to work around. It also does not have the ideal similarity measure for many uses (cosine is ok but llr would probably be better). You don't want stop word filtering, stemming, white space based tokenizing or n-grams. You would like explicit weighting. A good thing about Solr is how well it integrates with virtually any doc store independent of the indexing and query. A bit of an oval peg for a round hole.
>>
>> It looks like the similarity code is replaceable if not pluggable. Much of the rest could be trimmed away by config or adherence to conventions I suspect. In the demo site I'm working on I've had to adopt some slightly hacky conventions that I'll describe some day.
>>
>> On Oct 1, 2013, at 10:38 PM, Ted Dunning <te...@gmail.com> wrote:
>>
>>
>> Pat,
>>
>> Ellen and some folks in Britain have been working with some data I produced from synthetic music fans.
>>
>>
>> On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> Hi Ellen,
>>
>>
>> On Oct 1, 2013, at 12:38 PM, Ted Dunning <te...@gmail.com> wrote:
>>
>>
>> As requested,
>>
>> Pat, meet Ellen.
>>
>> Ellen, meet Pat.
>>
>>
>>
>>
>> On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <pa...@gmail.com> wrote:
>> Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout version.
>>
>> Things to note:
>> 1) The pure Mahout XRecommenderJob needs a cross-LLR or a cross-similairty job. Currently there is only cooccurrence for sparsification, which is far from optimal. This might take the form of a cross RSJ with two DRMs as input. I can't commit to this but would commit to adding it to the XRecommenderJob.
>> 2) output to Solr needs a lot of options implemented and tested. The hand-run test should be made into some junits. I'm slowly doing this.
>> 3) the Solr query API is unimplemented unless someone else is working on that. I'm building one in a demo site but it looks to me like a static recommender API is not going to be all that useful and maybe a document describing how to do it with the Solr query interface would be best, especially for a first step. The reasoning here is that it is so tempting to mix in metadata to the recommendation query that a static API is not so obvious. For the demo site the recommender API will be prototyped in a bunch of ways using models and controllers in Rails. If I'm the one to do the a Java Solr-recommender query API it will be after experimenting a bit.
>>
>> Can someone introduce me to Ellen and Tim?
>>
>> On Sep 28, 2013, at 10:59 AM, Ted Dunning <te...@gmail.com> wrote:
>>
>> The one large-ish feature that I think would find general use would be a high performance classifier trainer.
>>
>> Flor cleanup sort of thing it would be good to fully integrate the streaming k-means into the normal clustering commands while revamping the command line API.
>>
>> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can make 0.9.
>>
>> For recommendations, I think that the demo system that pat started with the elaborations by Ellen an Tim would be very good to have.
>>
>> I would be happy to collaborate with somebody on these but am not at all likely to have time to actually do them end to end.
>>
>> Sent from my iPhone
>>
>> On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:
>>
>>> Moving closer to 1.0, removing cruft, etc.  Do we have any more major features planned for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/ another release.
>>>
>>> -Grant
>>>
>>> On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
>>>
>>>> Sounds right in principle but perhaps a bit soon.
>>>>
>>>> What would define the release?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
>>>>
>>>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>>>
>>>>> -Grant
>>> --------------------------------------------
>>> Grant Ingersoll | @gsingers
>>> http://www.lucidworks.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>

Re: Solr-recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Solr uses cosine similarity for it's queries. The implementation on github uses Mahout LLR for calculating the item-item similarity matrix but when you do the more-like-this query at runtime Solr uses cosine. This can be fixed in Solr, not sure how much work.

It sounds like you are doing item-item similarities for recommendations, not actually calculating user-history based recs, is that true? 

You bring up a point that we're finding. I'm not so sure we need or want a recommender query API that is separate from the Solr query API. What we are doing on our demo site is putting the output of the Solr-recommender where Solr can index it. Our web app framework then allows very flexible queries against Solr, using simple user history, producing the typical user-history based recommendations, or mixing/boosting based on metadata or contextual data. If we leave the recommender query API in Solr we get web app framework integration for free.

Another point is where the data is stored for the running system. If we allow Solr to index from any storage service that it supports then we also get free integration with most any web app framework and storage service. For the demo site we put the data in a DB and have Solr index it from there. We also store the user history and metadata there. This is supported by most web app frameworks out of the box. You could go a different route and use almost any storage system/file system/content format since Solr supports a wide variety.

Given a fully flexible Solr standard query and indexing scheme all you need do is tweak the query or data source a bit and you have an item-set recommender (shopping cart) or a contextual recommender (for example boost recs from a category) or a pure metadata/content based recommender.  

If the query and storage is left to Solr+web app framework then the github version is complete if not done. Solr still needs LLR in the more-like-this queries. Term weights to encode strength scores would also be nice and I agree that both of these could use some work.

BTW lest we forget this does not imply the Solr-recommender is better than Myrrix or the Mahout-only recommenders. There needs to be some careful comparison of results. Michael, did you do offline or A/B tests during your implementation?

On Oct 9, 2013, at 6:13 AM, Michael Sokolov <ms...@safaribooksonline.com> wrote:

Just to add a note of encouragement for the idea of better integration between Mahout and Solr:

On safariflow.com, we've recently converted our recommender, which computes similarity scores w/Mahout, from storing scores and running queries w/Postgres, to doing all that in Solr.  It's been a big improvement, both in terms of indexing speed, and more importantly, the flexibility of the queries we can write.  I believe that having scoring built in to the query engine is a key feature for recommendations.  More and more I am coming to believe that recommendation should just be considered as another facet of search: as one among many variables the system may take into account when presenting relevant information to the user.  In our system, we still clearly separate search from recommendations, and we probably will always do that to some extent, but I think we will start to blend the queries more so that there will be essentially a continuum of query options including more or less "user preference" data.

I think what I'm talking about may be a bit different than what Pat is describing (in implementation terms), since we do LLR calculations off-line in Mahout and then bulk load them into Solr.  We took one of Ted's earlier suggestions to heart, and simply ignored the actual numeric scores: we index the top N similar items for each item.  Later we may incorporate numeric scores in Solr as term weights.  If people are looking for things to do :) I think that would be a great software contribution that could spur this effort onward since it's difficult to accomplish right now given the Solr/Lucene indexing interfaces, but is already supported by the underlying data model and query engine.

-Mike

On 10/2/13 12:19 PM, Pat Ferrel wrote:
> Excellent. From Ellen's description the first Music use may be an implicit preference based recommender using synthetic  data? I'm quickly discovering how flexible Solr use is in many of these cases.
> 
> Here's another use you may have thought of:
> 
> Shopping cart recommenders, as goes the intuition, are best modeled as recommending from similar item-sets. If you store all shopping carts as your training data (play lists, watch lists etc.) then as a user adds things to their cart you query for the most similar past carts. Combine the results intelligently and you'll have an item set recommender. Solr is built to do this item-set similarity. We tried to do this for a ecom site with pure Mahout but the similarity calc in real time stymied us. We knew we'd need Solr but couldn't devote the resources to spin it up.
> 
> On the Con-side Solr has a lot of stuff you have to work around. It also does not have the ideal similarity measure for many uses (cosine is ok but llr would probably be better). You don't want stop word filtering, stemming, white space based tokenizing or n-grams. You would like explicit weighting. A good thing about Solr is how well it integrates with virtually any doc store independent of the indexing and query. A bit of an oval peg for a round hole.
> 
> It looks like the similarity code is replaceable if not pluggable. Much of the rest could be trimmed away by config or adherence to conventions I suspect. In the demo site I'm working on I've had to adopt some slightly hacky conventions that I'll describe some day.
> 
> On Oct 1, 2013, at 10:38 PM, Ted Dunning <te...@gmail.com> wrote:
> 
> 
> Pat,
> 
> Ellen and some folks in Britain have been working with some data I produced from synthetic music fans.
> 
> 
> On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> Hi Ellen,
> 
> 
> On Oct 1, 2013, at 12:38 PM, Ted Dunning <te...@gmail.com> wrote:
> 
> 
> As requested,
> 
> Pat, meet Ellen.
> 
> Ellen, meet Pat.
> 
> 
> 
> 
> On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <pa...@gmail.com> wrote:
> Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout version.
> 
> Things to note:
> 1) The pure Mahout XRecommenderJob needs a cross-LLR or a cross-similairty job. Currently there is only cooccurrence for sparsification, which is far from optimal. This might take the form of a cross RSJ with two DRMs as input. I can't commit to this but would commit to adding it to the XRecommenderJob.
> 2) output to Solr needs a lot of options implemented and tested. The hand-run test should be made into some junits. I'm slowly doing this.
> 3) the Solr query API is unimplemented unless someone else is working on that. I'm building one in a demo site but it looks to me like a static recommender API is not going to be all that useful and maybe a document describing how to do it with the Solr query interface would be best, especially for a first step. The reasoning here is that it is so tempting to mix in metadata to the recommendation query that a static API is not so obvious. For the demo site the recommender API will be prototyped in a bunch of ways using models and controllers in Rails. If I'm the one to do the a Java Solr-recommender query API it will be after experimenting a bit.
> 
> Can someone introduce me to Ellen and Tim?
> 
> On Sep 28, 2013, at 10:59 AM, Ted Dunning <te...@gmail.com> wrote:
> 
> The one large-ish feature that I think would find general use would be a high performance classifier trainer.
> 
> Flor cleanup sort of thing it would be good to fully integrate the streaming k-means into the normal clustering commands while revamping the command line API.
> 
> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can make 0.9.
> 
> For recommendations, I think that the demo system that pat started with the elaborations by Ellen an Tim would be very good to have.
> 
> I would be happy to collaborate with somebody on these but am not at all likely to have time to actually do them end to end.
> 
> Sent from my iPhone
> 
> On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:
> 
>> Moving closer to 1.0, removing cruft, etc.  Do we have any more major features planned for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/ another release.
>> 
>> -Grant
>> 
>> On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
>> 
>>> Sounds right in principle but perhaps a bit soon.
>>> 
>>> What would define the release?
>>> 
>>> Sent from my iPhone
>>> 
>>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
>>> 
>>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>> 
>>>> -Grant
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 
>

Re: Solr-recommender

Posted by Ted Dunning <te...@gmail.com>.

Mike,

Thanks for the vote of confidence!


On Wed, Oct 9, 2013 at 6:13 AM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> Just to add a note of encouragement for the idea of better integration
> between Mahout and Solr:
>
> On safariflow.com, we've recently converted our recommender, which
> computes similarity scores w/Mahout, from storing scores and running
> queries w/Postgres, to doing all that in Solr.  It's been a big
> improvement, both in terms of indexing speed, and more importantly, the
> flexibility of the queries we can write.  I believe that having scoring
> built in to the query engine is a key feature for recommendations.  More
> and more I am coming to believe that recommendation should just be
> considered as another facet of search: as one among many variables the
> system may take into account when presenting relevant information to the
> user.  In our system, we still clearly separate search from
> recommendations, and we probably will always do that to some extent, but I
> think we will start to blend the queries more so that there will be
> essentially a continuum of query options including more or less "user
> preference" data.
>
> I think what I'm talking about may be a bit different than what Pat is
> describing (in implementation terms), since we do LLR calculations off-line
> in Mahout and then bulk load them into Solr.  We took one of Ted's earlier
> suggestions to heart, and simply ignored the actual numeric scores: we
> index the top N similar items for each item.  Later we may incorporate
> numeric scores in Solr as term weights.  If people are looking for things
> to do :) I think that would be a great software contribution that could
> spur this effort onward since it's difficult to accomplish right now given
> the Solr/Lucene indexing interfaces, but is already supported by the
> underlying data model and query engine.
>
>
> -Mike
>
>
> On 10/2/13 12:19 PM, Pat Ferrel wrote:
>
>> Excellent. From Ellen's description the first Music use may be an
>> implicit preference based recommender using synthetic  data? I'm quickly
>> discovering how flexible Solr use is in many of these cases.
>>
>> Here's another use you may have thought of:
>>
>> Shopping cart recommenders, as goes the intuition, are best modeled as
>> recommending from similar item-sets. If you store all shopping carts as
>> your training data (play lists, watch lists etc.) then as a user adds
>> things to their cart you query for the most similar past carts. Combine the
>> results intelligently and you'll have an item set recommender. Solr is
>> built to do this item-set similarity. We tried to do this for a ecom site
>> with pure Mahout but the similarity calc in real time stymied us. We knew
>> we'd need Solr but couldn't devote the resources to spin it up.
>>
>> On the Con-side Solr has a lot of stuff you have to work around. It also
>> does not have the ideal similarity measure for many uses (cosine is ok but
>> llr would probably be better). You don't want stop word filtering,
>> stemming, white space based tokenizing or n-grams. You would like explicit
>> weighting. A good thing about Solr is how well it integrates with virtually
>> any doc store independent of the indexing and query. A bit of an oval peg
>> for a round hole.
>>
>> It looks like the similarity code is replaceable if not pluggable. Much
>> of the rest could be trimmed away by config or adherence to conventions I
>> suspect. In the demo site I'm working on I've had to adopt some slightly
>> hacky conventions that I'll describe some day.
>>
>> On Oct 1, 2013, at 10:38 PM, Ted Dunning <te...@gmail.com> wrote:
>>
>>
>> Pat,
>>
>> Ellen and some folks in Britain have been working with some data I
>> produced from synthetic music fans.
>>
>>
>> On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> Hi Ellen,
>>
>>
>> On Oct 1, 2013, at 12:38 PM, Ted Dunning <te...@gmail.com> wrote:
>>
>>
>> As requested,
>>
>> Pat, meet Ellen.
>>
>> Ellen, meet Pat.
>>
>>
>>
>>
>> On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <pa...@gmail.com> wrote:
>> Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout
>> version.
>>
>> Things to note:
>> 1) The pure Mahout XRecommenderJob needs a cross-LLR or a
>> cross-similairty job. Currently there is only cooccurrence for
>> sparsification, which is far from optimal. This might take the form of a
>> cross RSJ with two DRMs as input. I can't commit to this but would commit
>> to adding it to the XRecommenderJob.
>> 2) output to Solr needs a lot of options implemented and tested. The
>> hand-run test should be made into some junits. I'm slowly doing this.
>> 3) the Solr query API is unimplemented unless someone else is working on
>> that. I'm building one in a demo site but it looks to me like a static
>> recommender API is not going to be all that useful and maybe a document
>> describing how to do it with the Solr query interface would be best,
>> especially for a first step. The reasoning here is that it is so tempting
>> to mix in metadata to the recommendation query that a static API is not so
>> obvious. For the demo site the recommender API will be prototyped in a
>> bunch of ways using models and controllers in Rails. If I'm the one to do
>> the a Java Solr-recommender query API it will be after experimenting a bit.
>>
>> Can someone introduce me to Ellen and Tim?
>>
>> On Sep 28, 2013, at 10:59 AM, Ted Dunning <te...@gmail.com> wrote:
>>
>> The one large-ish feature that I think would find general use would be a
>> high performance classifier trainer.
>>
>> Flor cleanup sort of thing it would be good to fully integrate the
>> streaming k-means into the normal clustering commands while revamping the
>> command line API.
>>
>> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure
>> it can make 0.9.
>>
>> For recommendations, I think that the demo system that pat started with
>> the elaborations by Ellen an Tim would be very good to have.
>>
>> I would be happy to collaborate with somebody on these but am not at all
>> likely to have time to actually do them end to end.
>>
>> Sent from my iPhone
>>
>> On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:
>>
>>  Moving closer to 1.0, removing cruft, etc.  Do we have any more major
>>> features planned for 1.0?  I think we said during 0.8 that we would try to
>>> follow pretty quickly w/ another release.
>>>
>>> -Grant
>>>
>>> On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
>>>
>>>  Sounds right in principle but perhaps a bit soon.
>>>>
>>>> What would define the release?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
>>>>
>>>>  Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>>>
>>>>> -Grant
>>>>>
>>>> ------------------------------**--------------
>>> Grant Ingersoll | @gsingers
>>> http://www.lucidworks.com
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>

Re: Solr-recommender

Posted by Michael Sokolov <ms...@safaribooksonline.com>.

Just to add a note of encouragement for the idea of better integration 
between Mahout and Solr:

On safariflow.com, we've recently converted our recommender, which 
computes similarity scores w/Mahout, from storing scores and running 
queries w/Postgres, to doing all that in Solr.  It's been a big 
improvement, both in terms of indexing speed, and more importantly, the 
flexibility of the queries we can write.  I believe that having scoring 
built in to the query engine is a key feature for recommendations.  More 
and more I am coming to believe that recommendation should just be 
considered as another facet of search: as one among many variables the 
system may take into account when presenting relevant information to the 
user.  In our system, we still clearly separate search from 
recommendations, and we probably will always do that to some extent, but 
I think we will start to blend the queries more so that there will be 
essentially a continuum of query options including more or less "user 
preference" data.

I think what I'm talking about may be a bit different than what Pat is 
describing (in implementation terms), since we do LLR calculations 
off-line in Mahout and then bulk load them into Solr.  We took one of 
Ted's earlier suggestions to heart, and simply ignored the actual 
numeric scores: we index the top N similar items for each item.  Later 
we may incorporate numeric scores in Solr as term weights.  If people 
are looking for things to do :) I think that would be a great software 
contribution that could spur this effort onward since it's difficult to 
accomplish right now given the Solr/Lucene indexing interfaces, but is 
already supported by the underlying data model and query engine.


-Mike

On 10/2/13 12:19 PM, Pat Ferrel wrote:
> Excellent. From Ellen's description the first Music use may be an implicit preference based recommender using synthetic  data? I'm quickly discovering how flexible Solr use is in many of these cases.
>
> Here's another use you may have thought of:
>
> Shopping cart recommenders, as goes the intuition, are best modeled as recommending from similar item-sets. If you store all shopping carts as your training data (play lists, watch lists etc.) then as a user adds things to their cart you query for the most similar past carts. Combine the results intelligently and you'll have an item set recommender. Solr is built to do this item-set similarity. We tried to do this for a ecom site with pure Mahout but the similarity calc in real time stymied us. We knew we'd need Solr but couldn't devote the resources to spin it up.
>
> On the Con-side Solr has a lot of stuff you have to work around. It also does not have the ideal similarity measure for many uses (cosine is ok but llr would probably be better). You don't want stop word filtering, stemming, white space based tokenizing or n-grams. You would like explicit weighting. A good thing about Solr is how well it integrates with virtually any doc store independent of the indexing and query. A bit of an oval peg for a round hole.
>
> It looks like the similarity code is replaceable if not pluggable. Much of the rest could be trimmed away by config or adherence to conventions I suspect. In the demo site I'm working on I've had to adopt some slightly hacky conventions that I'll describe some day.
>
> On Oct 1, 2013, at 10:38 PM, Ted Dunning <te...@gmail.com> wrote:
>
>
> Pat,
>
> Ellen and some folks in Britain have been working with some data I produced from synthetic music fans.
>
>
> On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> Hi Ellen,
>
>
> On Oct 1, 2013, at 12:38 PM, Ted Dunning <te...@gmail.com> wrote:
>
>
> As requested,
>
> Pat, meet Ellen.
>
> Ellen, meet Pat.
>
>
>
>
> On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <pa...@gmail.com> wrote:
> Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout version.
>
> Things to note:
> 1) The pure Mahout XRecommenderJob needs a cross-LLR or a cross-similairty job. Currently there is only cooccurrence for sparsification, which is far from optimal. This might take the form of a cross RSJ with two DRMs as input. I can't commit to this but would commit to adding it to the XRecommenderJob.
> 2) output to Solr needs a lot of options implemented and tested. The hand-run test should be made into some junits. I'm slowly doing this.
> 3) the Solr query API is unimplemented unless someone else is working on that. I'm building one in a demo site but it looks to me like a static recommender API is not going to be all that useful and maybe a document describing how to do it with the Solr query interface would be best, especially for a first step. The reasoning here is that it is so tempting to mix in metadata to the recommendation query that a static API is not so obvious. For the demo site the recommender API will be prototyped in a bunch of ways using models and controllers in Rails. If I'm the one to do the a Java Solr-recommender query API it will be after experimenting a bit.
>
> Can someone introduce me to Ellen and Tim?
>
> On Sep 28, 2013, at 10:59 AM, Ted Dunning <te...@gmail.com> wrote:
>
> The one large-ish feature that I think would find general use would be a high performance classifier trainer.
>
> Flor cleanup sort of thing it would be good to fully integrate the streaming k-means into the normal clustering commands while revamping the command line API.
>
> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can make 0.9.
>
> For recommendations, I think that the demo system that pat started with the elaborations by Ellen an Tim would be very good to have.
>
> I would be happy to collaborate with somebody on these but am not at all likely to have time to actually do them end to end.
>
> Sent from my iPhone
>
> On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:
>
>> Moving closer to 1.0, removing cruft, etc.  Do we have any more major features planned for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/ another release.
>>
>> -Grant
>>
>> On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
>>
>>> Sounds right in principle but perhaps a bit soon.
>>>
>>> What would define the release?
>>>
>>> Sent from my iPhone
>>>
>>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
>>>
>>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>>
>>>> -Grant
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>>
>>
>>
>>
>>
>
>
>
>
>

Re: Solr-recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Excellent. From Ellen's description the first Music use may be an implicit preference based recommender using synthetic  data? I'm quickly discovering how flexible Solr use is in many of these cases.

Here's another use you may have thought of:

Shopping cart recommenders, as goes the intuition, are best modeled as recommending from similar item-sets. If you store all shopping carts as your training data (play lists, watch lists etc.) then as a user adds things to their cart you query for the most similar past carts. Combine the results intelligently and you'll have an item set recommender. Solr is built to do this item-set similarity. We tried to do this for a ecom site with pure Mahout but the similarity calc in real time stymied us. We knew we'd need Solr but couldn't devote the resources to spin it up.

On the Con-side Solr has a lot of stuff you have to work around. It also does not have the ideal similarity measure for many uses (cosine is ok but llr would probably be better). You don't want stop word filtering, stemming, white space based tokenizing or n-grams. You would like explicit weighting. A good thing about Solr is how well it integrates with virtually any doc store independent of the indexing and query. A bit of an oval peg for a round hole.

It looks like the similarity code is replaceable if not pluggable. Much of the rest could be trimmed away by config or adherence to conventions I suspect. In the demo site I'm working on I've had to adopt some slightly hacky conventions that I'll describe some day. 

On Oct 1, 2013, at 10:38 PM, Ted Dunning <te...@gmail.com> wrote:

Pat,

Ellen and some folks in Britain have been working with some data I produced from synthetic music fans.

On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Hi Ellen,

On Oct 1, 2013, at 12:38 PM, Ted Dunning <te...@gmail.com> wrote:

As requested, 

Pat, meet Ellen.

Ellen, meet Pat.

On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <pa...@gmail.com> wrote:
Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout version.

Things to note:
1) The pure Mahout XRecommenderJob needs a cross-LLR or a cross-similairty job. Currently there is only cooccurrence for sparsification, which is far from optimal. This might take the form of a cross RSJ with two DRMs as input. I can't commit to this but would commit to adding it to the XRecommenderJob.
2) output to Solr needs a lot of options implemented and tested. The hand-run test should be made into some junits. I'm slowly doing this.
3) the Solr query API is unimplemented unless someone else is working on that. I'm building one in a demo site but it looks to me like a static recommender API is not going to be all that useful and maybe a document describing how to do it with the Solr query interface would be best, especially for a first step. The reasoning here is that it is so tempting to mix in metadata to the recommendation query that a static API is not so obvious. For the demo site the recommender API will be prototyped in a bunch of ways using models and controllers in Rails. If I'm the one to do the a Java Solr-recommender query API it will be after experimenting a bit.

Can someone introduce me to Ellen and Tim?

On Sep 28, 2013, at 10:59 AM, Ted Dunning <te...@gmail.com> wrote:

The one large-ish feature that I think would find general use would be a high performance classifier trainer.

Flor cleanup sort of thing it would be good to fully integrate the streaming k-means into the normal clustering commands while revamping the command line API.

Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can make 0.9.

For recommendations, I think that the demo system that pat started with the elaborations by Ellen an Tim would be very good to have.

I would be happy to collaborate with somebody on these but am not at all likely to have time to actually do them end to end.

Sent from my iPhone

On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:

> Moving closer to 1.0, removing cruft, etc.  Do we have any more major features planned for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/ another release.
>
> -Grant
>
> On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> Sounds right in principle but perhaps a bit soon.
>>
>> What would define the release?
>>
>> Sent from my iPhone
>>
>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
>>
>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>
>>> -Grant
>
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>

Re: 0.9?

Posted by Pat Ferrel <pa...@gmail.com>.

Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout version.

Things to note:
1) The pure Mahout XRecommenderJob needs a cross-LLR or a cross-similairty job. Currently there is only cooccurrence for sparsification, which is far from optimal. This might take the form of a cross RSJ with two DRMs as input. I can't commit to this but would commit to adding it to the XRecommenderJob.
2) output to Solr needs a lot of options implemented and tested. The hand-run test should be made into some junits. I'm slowly doing this.
3) the Solr query API is unimplemented unless someone else is working on that. I'm building one in a demo site but it looks to me like a static recommender API is not going to be all that useful and maybe a document describing how to do it with the Solr query interface would be best, especially for a first step. The reasoning here is that it is so tempting to mix in metadata to the recommendation query that a static API is not so obvious. For the demo site the recommender API will be prototyped in a bunch of ways using models and controllers in Rails. If I'm the one to do the a Java Solr-recommender query API it will be after experimenting a bit. 

Can someone introduce me to Ellen and Tim?

On Sep 28, 2013, at 10:59 AM, Ted Dunning <te...@gmail.com> wrote:

The one large-ish feature that I think would find general use would be a high performance classifier trainer.  

Flor cleanup sort of thing it would be good to fully integrate the streaming k-means into the normal clustering commands while revamping the command line API.  

Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can make 0.9. 

For recommendations, I think that the demo system that pat started with the elaborations by Ellen an Tim would be very good to have. 

I would be happy to collaborate with somebody on these but am not at all likely to have time to actually do them end to end. 

Sent from my iPhone

On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:

> Moving closer to 1.0, removing cruft, etc.  Do we have any more major features planned for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/ another release.
> 
> -Grant
> 
> On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
> 
>> Sounds right in principle but perhaps a bit soon.  
>> 
>> What would define the release?
>> 
>> Sent from my iPhone
>> 
>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
>> 
>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>> 
>>> -Grant
> 
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
> 
> 
> 
> 
>

Re: 0.9?

Posted by Ted Dunning <te...@gmail.com>.

The one large-ish feature that I think would find general use would be a high performance classifier trainer.  

Flor cleanup sort of thing it would be good to fully integrate the streaming k-means into the normal clustering commands while revamping the command line API.  

Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can make 0.9. 

For recommendations, I think that the demo system that pat started with the elaborations by Ellen an Tim would be very good to have. 

I would be happy to collaborate with somebody on these but am not at all likely to have time to actually do them end to end. 

Sent from my iPhone

On Sep 28, 2013, at 12:40, Grant Ingersoll <gs...@apache.org> wrote:

> Moving closer to 1.0, removing cruft, etc.  Do we have any more major features planned for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/ another release.
> 
> -Grant
> 
> On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:
> 
>> Sounds right in principle but perhaps a bit soon.  
>> 
>> What would define the release?
>> 
>> Sent from my iPhone
>> 
>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
>> 
>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>> 
>>> -Grant
> 
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
> 
> 
> 
> 
>

Re: 0.9?

Posted by Grant Ingersoll <gs...@apache.org>.

Moving closer to 1.0, removing cruft, etc.  Do we have any more major features planned for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/ another release.

-Grant

On Sep 28, 2013, at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:

> Sounds right in principle but perhaps a bit soon.  
> 
> What would define the release?
> 
> Sent from my iPhone
> 
> On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:
> 
>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>> 
>> -Grant

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: 0.9?

Posted by Ted Dunning <te...@gmail.com>.

Sounds right in principle but perhaps a bit soon.  

What would define the release?

Sent from my iPhone

On Sep 27, 2013, at 7:48, Grant Ingersoll <gs...@apache.org> wrote:

> Anyone interested in thinking about 0.9 in the early Nov. time frame?
> 
> -Grant