You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Achim Domma <do...@procoders.net> on 2013/05/21 11:47:07 UTC
MoreLikeThisHandler + Facets
Im calling the MoreLikeThisHandler with a query like "id:some_doc_id", so I would like to get documents which are similar to one specific document. I restrict the result to 25 rows and I calculate facets for some fields.
On what data are those facets calculated? According to the documentation out of the similar documents, which is the main difference to the default search handler. But on how many of them? Is it possible to restrict the documents somehow? I would like my facets to be calculated based only on the top 1000 most similar documents.
kind regards,
Achim
Re: MoreLikeThisHandler + Facets
Posted by Otis Gospodnetic <ot...@gmail.com>.
The other option is to use various MLT params to return fewer similar
documents to begin with.
Otis
Solr & ElasticSearch Support
http://sematext.com/
On May 21, 2013 7:00 PM, "Jack Krupansky" <ja...@basetechnology.com> wrote:
> I think I follow. AFAIK, Solr does not have a provision for limiting
> faceting to the "top n" documents, but that does see like a reasonable
> feature request. At the Lucene I presume it would simply be a matter of
> having a hit collector that only accepts the top n documents. But, I'm not
> familiar enough with the internal details of the Solr faceting code.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Achim Domma
> Sent: Tuesday, May 21, 2013 6:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MoreLikeThisHandler + Facets
>
> Our current index contains nearly 400k documents and will grow to a few
> millions. Our "more like this"-search is always based on a single document,
> so my query is "id:some_doc_id". For such a query I usually get at least
> 150k "similar" documents. This definition of "similar" is way so relaxed.
> Usually only a few hundred or thousand documents near the reference
> document are really of any interest to our users.
>
> Now assume that I get some facet values, which appear very often in the
> similar documents starting at position 50k, but usually not near the
> reference document. This facet will show currently show up in my facet
> results. If I use this facet value for filtering, I restrict to result to
> documents which are not of any interest to the user.
>
> We want to provide facets, which allow the user to explore and trill down
> the documents in the near neighborhood of our reference document.
>
> If I'm on the complete wrong track, please let me know. I'm open for any
> suggestions. Is it possible, that just our definition of "similar" does not
> match Solrs model? I would also be willing to dig into code and to
> implement a custom similarity. But currently it feels like I don't get the
> base concepts right!? Any hint and guidance would be very welcome.
>
> kind regards,
> Achim
>
>
> Am 21.05.2013 um 15:27 schrieb Jack Krupansky:
>
> Any particular reason you would want to limit the documents for facet
>> calculation? I mean, the whole point of the facet numbers is to let users
>> know what's out there. You must have some other rationale in mind - what is
>> it?
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Achim Domma
>> Sent: Tuesday, May 21, 2013 5:47 AM
>> To: solr-user@lucene.apache.org
>> Subject: MoreLikeThisHandler + Facets
>>
>> Im calling the MoreLikeThisHandler with a query like "id:some_doc_id", so
>> I would like to get documents which are similar to one specific document. I
>> restrict the result to 25 rows and I calculate facets for some fields.
>>
>> On what data are those facets calculated? According to the documentation
>> out of the similar documents, which is the main difference to the default
>> search handler. But on how many of them? Is it possible to restrict the
>> documents somehow? I would like my facets to be calculated based only on
>> the top 1000 most similar documents.
>>
>> kind regards,
>> Achim=
>>
>
>
Re: MoreLikeThisHandler + Facets
Posted by Jack Krupansky <ja...@basetechnology.com>.
I think I follow. AFAIK, Solr does not have a provision for limiting
faceting to the "top n" documents, but that does see like a reasonable
feature request. At the Lucene I presume it would simply be a matter of
having a hit collector that only accepts the top n documents. But, I'm not
familiar enough with the internal details of the Solr faceting code.
-- Jack Krupansky
-----Original Message-----
From: Achim Domma
Sent: Tuesday, May 21, 2013 6:39 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThisHandler + Facets
Our current index contains nearly 400k documents and will grow to a few
millions. Our "more like this"-search is always based on a single document,
so my query is "id:some_doc_id". For such a query I usually get at least
150k "similar" documents. This definition of "similar" is way so relaxed.
Usually only a few hundred or thousand documents near the reference document
are really of any interest to our users.
Now assume that I get some facet values, which appear very often in the
similar documents starting at position 50k, but usually not near the
reference document. This facet will show currently show up in my facet
results. If I use this facet value for filtering, I restrict to result to
documents which are not of any interest to the user.
We want to provide facets, which allow the user to explore and trill down
the documents in the near neighborhood of our reference document.
If I'm on the complete wrong track, please let me know. I'm open for any
suggestions. Is it possible, that just our definition of "similar" does not
match Solrs model? I would also be willing to dig into code and to implement
a custom similarity. But currently it feels like I don't get the base
concepts right!? Any hint and guidance would be very welcome.
kind regards,
Achim
Am 21.05.2013 um 15:27 schrieb Jack Krupansky:
> Any particular reason you would want to limit the documents for facet
> calculation? I mean, the whole point of the facet numbers is to let users
> know what's out there. You must have some other rationale in mind - what
> is it?
>
> -- Jack Krupansky
>
> -----Original Message----- From: Achim Domma
> Sent: Tuesday, May 21, 2013 5:47 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThisHandler + Facets
>
> Im calling the MoreLikeThisHandler with a query like "id:some_doc_id", so
> I would like to get documents which are similar to one specific document.
> I restrict the result to 25 rows and I calculate facets for some fields.
>
> On what data are those facets calculated? According to the documentation
> out of the similar documents, which is the main difference to the default
> search handler. But on how many of them? Is it possible to restrict the
> documents somehow? I would like my facets to be calculated based only on
> the top 1000 most similar documents.
>
> kind regards,
> Achim=
Re: MoreLikeThisHandler + Facets
Posted by Achim Domma <do...@procoders.net>.
Our current index contains nearly 400k documents and will grow to a few millions. Our "more like this"-search is always based on a single document, so my query is "id:some_doc_id". For such a query I usually get at least 150k "similar" documents. This definition of "similar" is way so relaxed. Usually only a few hundred or thousand documents near the reference document are really of any interest to our users.
Now assume that I get some facet values, which appear very often in the similar documents starting at position 50k, but usually not near the reference document. This facet will show currently show up in my facet results. If I use this facet value for filtering, I restrict to result to documents which are not of any interest to the user.
We want to provide facets, which allow the user to explore and trill down the documents in the near neighborhood of our reference document.
If I'm on the complete wrong track, please let me know. I'm open for any suggestions. Is it possible, that just our definition of "similar" does not match Solrs model? I would also be willing to dig into code and to implement a custom similarity. But currently it feels like I don't get the base concepts right!? Any hint and guidance would be very welcome.
kind regards,
Achim
Am 21.05.2013 um 15:27 schrieb Jack Krupansky:
> Any particular reason you would want to limit the documents for facet calculation? I mean, the whole point of the facet numbers is to let users know what's out there. You must have some other rationale in mind - what is it?
>
> -- Jack Krupansky
>
> -----Original Message----- From: Achim Domma
> Sent: Tuesday, May 21, 2013 5:47 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThisHandler + Facets
>
> Im calling the MoreLikeThisHandler with a query like "id:some_doc_id", so I would like to get documents which are similar to one specific document. I restrict the result to 25 rows and I calculate facets for some fields.
>
> On what data are those facets calculated? According to the documentation out of the similar documents, which is the main difference to the default search handler. But on how many of them? Is it possible to restrict the documents somehow? I would like my facets to be calculated based only on the top 1000 most similar documents.
>
> kind regards,
> Achim=
Re: MoreLikeThisHandler + Facets
Posted by Jack Krupansky <ja...@basetechnology.com>.
Any particular reason you would want to limit the documents for facet
calculation? I mean, the whole point of the facet numbers is to let users
know what's out there. You must have some other rationale in mind - what is
it?
-- Jack Krupansky
-----Original Message-----
From: Achim Domma
Sent: Tuesday, May 21, 2013 5:47 AM
To: solr-user@lucene.apache.org
Subject: MoreLikeThisHandler + Facets
Im calling the MoreLikeThisHandler with a query like "id:some_doc_id", so I
would like to get documents which are similar to one specific document. I
restrict the result to 25 rows and I calculate facets for some fields.
On what data are those facets calculated? According to the documentation out
of the similar documents, which is the main difference to the default search
handler. But on how many of them? Is it possible to restrict the documents
somehow? I would like my facets to be calculated based only on the top 1000
most similar documents.
kind regards,
Achim=