You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Diego Ceccarelli (BLOOMBERG/ LONDON)" <dc...@bloomberg.net> on 2018/10/04 13:33:52 UTC

Re: solr and diversification

The use case is on ranking news, Joel. And yes, I have the feeling that it might improve relevance and in 2011/2012 there was a lot of work on this in academia..

Thanks Tim, I'll check out MMR. 

From: solr-user@lucene.apache.org At: 09/28/18 20:24:44To:  solr-user@lucene.apache.org
Subject: Re: solr and diversification

Interesting, I had not heard of MMR.


Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Sep 28, 2018 at 10:43 AM Tim Allison <ta...@apache.org> wrote:

> If you haven’t already, might want to check out maximal marginal
> relevance...original paper: Carbonell and Goldstein.
>
> On Thu, Sep 27, 2018 at 7:29 PM Joel Bernstein <jo...@gmail.com> wrote:
>
> > Yeah, I think your plan sounds fine.
> >
> > Do you have a specific use case for diversity of results. I've been
> > wondering if diversity of results would provide better perceived
> relevance.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, Sep 27, 2018 at 1:39 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > dceccarelli4@bloomberg.net> wrote:
> >
> > > Yeah, I think Kmeans might be a way to implement the "top 3 stories
> that
> > > are more distant", but you can also have a more naïve (and faster)
> > strategy
> > > like
> > >  - sending a threshold
> > >  - scan the documents according to the relevance score
> > >  - select the top documents that have diversity > threshold.
> > >
> > > I would allow to define the strategy and select it from the request.
> > >
> > > From: solr-user@lucene.apache.org At: 09/27/18 18:25:43To:  Diego
> > > Ceccarelli (BLOOMBERG/ LONDON ) ,  solr-user@lucene.apache.org
> > > Subject: Re: solr and diversification
> > >
> > > I've thought about this problem a little bit. What I was considering
> was
> > > using Kmeans clustering to cluster the top 50 docs, then pulling the
> top
> > > scoring doc form each cluster as the top documents. This should be fast
> > and
> > > effective at getting diversity.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Thu, Sep 27, 2018 at 1:20 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > > dceccarelli4@bloomberg.net> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm considering to write a component for diversifying the results. I
> > know
> > > > that diversification can be achieved by using grouping but I'm
> thinking
> > > > about something different and query biased.
> > > > The idea is to have something that gets applied after the normal
> > > retrieval
> > > > and selects the top k documents more diverse based on some distance
> > > metric:
> > > >
> > > > Example:
> > > > imagine that you are asking for 10 rows, and you set diversify.rows=3
> > > > diversity.metric=tfidf  diversify.field=body
> > > >
> > > > Solr might retrieve the the top 10 rows as usual, extract tfidf
> vectors
> > > > for the bodies and select the top 3 stories that are more distant
> > > according
> > > > to the cosine similarity.
> > > > This would be different from grouping because documents will be
> > > > 'collapsed' or not based on the subset of documents retrieved for the
> > > > query.
> > > > Do you think it would make sense to have it as a component?  any
> > feedback
> > > > / idea?
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
>