You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Paul <pa...@nines.org> on 2010/07/14 22:16:02 UTC

limiting the total number of documents matched

I'd like to limit the total number of documents that are returned for
a search, particularly when the sort order is not based on relevancy.

In other words, if the user searches for a very common term, they
might get tens of thousands of hits, and if they sort by "title", then
very high relevancy documents will be interspersed with very low
relevancy documents. I'd like to set a limit to the 1000 most relevant
documents, then sort those by title.

Is there a way to do this?

I guess I could always retrieve the top 1000 documents and sort them
in the client, but that seems particularly inefficient. I can't find
any other way to do this, though.

Thanks,
Paul

Re: limiting the total number of documents matched

Posted by Lance Norskog <go...@gmail.com>.
Yes, multiple (radix) sorts work and you can use the score value. The
sort parameters come in order, most important to least important.

This sorts first by score, and then documents with the same score are
sorted by field f:

sort=score+desc,f+asc



On Wed, Jul 14, 2010 at 2:46 PM, Paul <pa...@nines.org> wrote:
> I thought of another way to do it, but I still have one thing I don't
> know how to do. I could do the search without sorting for the 50th
> page, then look at the relevancy score on the first item on that page,
> then repeat the search, but add score > that relevancy as a parameter.
> Is it possible to do a search with "score:[5 to *]"? It didn't work in
> my first attempt.
>
> On Wed, Jul 14, 2010 at 5:34 PM, Paul <pa...@nines.org> wrote:
>> I was hoping for a way to do this purely by configuration and making
>> the correct GET requests, but if there is a way to do it by creating a
>> custom Request Handler, I suppose I could plunge into that. Would that
>> yield the best results, and would that be particularly difficult?
>>
>> On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin
>> <KN...@globeandmail.com> wrote:
>>> So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against the IDs and sort by your other field. 1000 seems like a lot for that approach, but who knows until you try it on your data.
>>>
>>> -Kallin Nagelberg
>>>
>>>
>>> -----Original Message-----
>>> From: Paul [mailto:paul@nines.org]
>>> Sent: Wednesday, July 14, 2010 4:16 PM
>>> To: solr-user
>>> Subject: limiting the total number of documents matched
>>>
>>> I'd like to limit the total number of documents that are returned for
>>> a search, particularly when the sort order is not based on relevancy.
>>>
>>> In other words, if the user searches for a very common term, they
>>> might get tens of thousands of hits, and if they sort by "title", then
>>> very high relevancy documents will be interspersed with very low
>>> relevancy documents. I'd like to set a limit to the 1000 most relevant
>>> documents, then sort those by title.
>>>
>>> Is there a way to do this?
>>>
>>> I guess I could always retrieve the top 1000 documents and sort them
>>> in the client, but that seems particularly inefficient. I can't find
>>> any other way to do this, though.
>>>
>>> Thanks,
>>> Paul
>>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: limiting the total number of documents matched

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Wed, Jul 14, 2010 at 5:46 PM, Paul <pa...@nines.org> wrote:
> I thought of another way to do it, but I still have one thing I don't
> know how to do. I could do the search without sorting for the 50th
> page, then look at the relevancy score on the first item on that page,
> then repeat the search, but add score > that relevancy as a parameter.
> Is it possible to do a search with "score:[5 to *]"? It didn't work in
> my first attempt.

frange could possible help (range query on an arbitrary function).
http://www.lucidimagination.com/blog/tag/frange/

So perhaps something like
q={!frange l=0.85}query($qq)
qq=<the original relevancy query>

where 0.85 is the lower bound you want for scores and qq is the normal
relevancy query

-Yonik
http://www.lucidimagination.com


>
> On Wed, Jul 14, 2010 at 5:34 PM, Paul <pa...@nines.org> wrote:
>> I was hoping for a way to do this purely by configuration and making
>> the correct GET requests, but if there is a way to do it by creating a
>> custom Request Handler, I suppose I could plunge into that. Would that
>> yield the best results, and would that be particularly difficult?
>>
>> On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin
>> <KN...@globeandmail.com> wrote:
>>> So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against the IDs and sort by your other field. 1000 seems like a lot for that approach, but who knows until you try it on your data.
>>>
>>> -Kallin Nagelberg
>>>
>>>
>>> -----Original Message-----
>>> From: Paul [mailto:paul@nines.org]
>>> Sent: Wednesday, July 14, 2010 4:16 PM
>>> To: solr-user
>>> Subject: limiting the total number of documents matched
>>>
>>> I'd like to limit the total number of documents that are returned for
>>> a search, particularly when the sort order is not based on relevancy.
>>>
>>> In other words, if the user searches for a very common term, they
>>> might get tens of thousands of hits, and if they sort by "title", then
>>> very high relevancy documents will be interspersed with very low
>>> relevancy documents. I'd like to set a limit to the 1000 most relevant
>>> documents, then sort those by title.
>>>
>>> Is there a way to do this?
>>>
>>> I guess I could always retrieve the top 1000 documents and sort them
>>> in the client, but that seems particularly inefficient. I can't find
>>> any other way to do this, though.
>>>
>>> Thanks,
>>> Paul
>>>
>>
>

Re: limiting the total number of documents matched

Posted by Paul <pa...@nines.org>.
I thought of another way to do it, but I still have one thing I don't
know how to do. I could do the search without sorting for the 50th
page, then look at the relevancy score on the first item on that page,
then repeat the search, but add score > that relevancy as a parameter.
Is it possible to do a search with "score:[5 to *]"? It didn't work in
my first attempt.

On Wed, Jul 14, 2010 at 5:34 PM, Paul <pa...@nines.org> wrote:
> I was hoping for a way to do this purely by configuration and making
> the correct GET requests, but if there is a way to do it by creating a
> custom Request Handler, I suppose I could plunge into that. Would that
> yield the best results, and would that be particularly difficult?
>
> On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin
> <KN...@globeandmail.com> wrote:
>> So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against the IDs and sort by your other field. 1000 seems like a lot for that approach, but who knows until you try it on your data.
>>
>> -Kallin Nagelberg
>>
>>
>> -----Original Message-----
>> From: Paul [mailto:paul@nines.org]
>> Sent: Wednesday, July 14, 2010 4:16 PM
>> To: solr-user
>> Subject: limiting the total number of documents matched
>>
>> I'd like to limit the total number of documents that are returned for
>> a search, particularly when the sort order is not based on relevancy.
>>
>> In other words, if the user searches for a very common term, they
>> might get tens of thousands of hits, and if they sort by "title", then
>> very high relevancy documents will be interspersed with very low
>> relevancy documents. I'd like to set a limit to the 1000 most relevant
>> documents, then sort those by title.
>>
>> Is there a way to do this?
>>
>> I guess I could always retrieve the top 1000 documents and sort them
>> in the client, but that seems particularly inefficient. I can't find
>> any other way to do this, though.
>>
>> Thanks,
>> Paul
>>
>

Re: limiting the total number of documents matched

Posted by Paul <pa...@nines.org>.
I was hoping for a way to do this purely by configuration and making
the correct GET requests, but if there is a way to do it by creating a
custom Request Handler, I suppose I could plunge into that. Would that
yield the best results, and would that be particularly difficult?

On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin
<KN...@globeandmail.com> wrote:
> So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against the IDs and sort by your other field. 1000 seems like a lot for that approach, but who knows until you try it on your data.
>
> -Kallin Nagelberg
>
>
> -----Original Message-----
> From: Paul [mailto:paul@nines.org]
> Sent: Wednesday, July 14, 2010 4:16 PM
> To: solr-user
> Subject: limiting the total number of documents matched
>
> I'd like to limit the total number of documents that are returned for
> a search, particularly when the sort order is not based on relevancy.
>
> In other words, if the user searches for a very common term, they
> might get tens of thousands of hits, and if they sort by "title", then
> very high relevancy documents will be interspersed with very low
> relevancy documents. I'd like to set a limit to the 1000 most relevant
> documents, then sort those by title.
>
> Is there a way to do this?
>
> I guess I could always retrieve the top 1000 documents and sort them
> in the client, but that seems particularly inefficient. I can't find
> any other way to do this, though.
>
> Thanks,
> Paul
>

RE: limiting the total number of documents matched

Posted by "Nagelberg, Kallin" <KN...@globeandmail.com>.
So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against the IDs and sort by your other field. 1000 seems like a lot for that approach, but who knows until you try it on your data.

-Kallin Nagelberg 


-----Original Message-----
From: Paul [mailto:paul@nines.org] 
Sent: Wednesday, July 14, 2010 4:16 PM
To: solr-user
Subject: limiting the total number of documents matched

I'd like to limit the total number of documents that are returned for
a search, particularly when the sort order is not based on relevancy.

In other words, if the user searches for a very common term, they
might get tens of thousands of hits, and if they sort by "title", then
very high relevancy documents will be interspersed with very low
relevancy documents. I'd like to set a limit to the 1000 most relevant
documents, then sort those by title.

Is there a way to do this?

I guess I could always retrieve the top 1000 documents and sort them
in the client, but that seems particularly inefficient. I can't find
any other way to do this, though.

Thanks,
Paul