You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jesus Gabriel y Galan <je...@buongiorno.com> on 2011/06/02 12:21:12 UTC

Question about sorting by coordination factor

Hi,

I am trying to solve a sorting problem using Solr. The sorting requirements are a bit complicated.
I have to sort the documents by three different criteria:

- First by number of keywords that match  (coordination factor)
- Then, within the documents that match the same number of keywords, sort first the documents that match a user value (country) and then the rest.
- Then within those two blocks, sort by a document value (popularity).

I have managed to make the second and third criteria to work, with a query like this:

http://localhost:8983/solr/select/?q=description%3Afootball&version=2.2&start=0&rows=10&indent=on&qq=country_uk:true&sort=map%28query%28$qq,-1%29,0,9999999,1%29%20desc,popularity%20desc

This gets with the query function a positive value for the documents that match the country, and a negative for the ones that don't, and then maps those ones to 1, so I have two blocks of documents with sorting value of 1 and -1, which works for me cause ties are then sorted by popularity.  But as you see, this is only searching for 1 keyword.

My problem comes with the first requirement when we search for more than one keyword, because as I understand, I would like to sort by the coordination factor, which is the number of query keywords that each document matches. The problem is that there's no Function Query I can use to get that value, so I don't know how to proceed. I was trying to understand if there was a way to split the regular score into sets which should mean that the same number of keywords was matched, but the score depends on different things, and the range of values can be arbitrary, so I'm not able to make such a function.

Is there any solution to this?

Thanks,

Jesus.

Re: Question about sorting by coordination factor

Posted by Erick Erickson <er...@gmail.com>.
Ahhh, you're right. I know there's been some discussion in the past about
how to find out the number of terms that matched, but don't remember the
outcome off-hand. You might try searching the mail archive for something like
"number of matching terms" or some such.

Sorry I'm not more help
Erick

On Thu, Jun 2, 2011 at 8:48 AM, Jesus Gabriel y Galan
<je...@buongiorno.com> wrote:
> On 02/06/11 13:32, Erick Erickson wrote:
>>
>> Say you're trying to match terms A, B, C. Would something like
>>
>> (A AND B AND C)^1000 OR (A AND B)^100 OR (A AND C)^100 OR (B AND
>> C)^100 OR A OR B OR C
>>
>> work? It wouldn't be an absolute ordering, but it would tend to
>> push the documents where all three terms matched toward
>> the top.
>
> The problem with this is that that would give better score to the documents
> with most number of matches, but then I have to sort internally those
> groups. So I'd need a sort=score,xxx,yyy and the score would not be equal
> for the documents which match the same number of keywords.
> I would need to have as many groups as keywords, and within each group all
> documents need to have the same value for that sorting criteria (score or a
> function or whatever), so that they tie, and they move to the next sorting
> criteria.
>
> Thanks,
>
> Jesus.
>

Re: Question about sorting by coordination factor

Posted by Jesus Gabriel y Galan <je...@buongiorno.com>.
On 02/06/11 13:32, Erick Erickson wrote:
> Say you're trying to match terms A, B, C. Would something like
>
> (A AND B AND C)^1000 OR (A AND B)^100 OR (A AND C)^100 OR (B AND
> C)^100 OR A OR B OR C
>
> work? It wouldn't be an absolute ordering, but it would tend to
> push the documents where all three terms matched toward
> the top.

The problem with this is that that would give better score to the documents with most number of matches, but then I have to sort internally those groups. So I'd need a sort=score,xxx,yyy and the score would not be equal for the documents which match the same number of keywords.
I would need to have as many groups as keywords, and within each group all documents need to have the same value for that sorting criteria (score or a function or whatever), so that they tie, and they move to the next sorting criteria.

Thanks,

Jesus.

Re: Question about sorting by coordination factor

Posted by Erick Erickson <er...@gmail.com>.
Say you're trying to match terms A, B, C. Would something like

(A AND B AND C)^1000 OR (A AND B)^100 OR (A AND C)^100 OR (B AND
C)^100 OR A OR B OR C

work? It wouldn't be an absolute ordering, but it would tend to
push the documents where all three terms matched toward
the top.

It would get really cumbersome if there were lots of terms, but.....

Best
Erick

On Thu, Jun 2, 2011 at 6:21 AM, Jesus Gabriel y Galan
<je...@buongiorno.com> wrote:
> Hi,
>
> I am trying to solve a sorting problem using Solr. The sorting requirements
> are a bit complicated.
> I have to sort the documents by three different criteria:
>
> - First by number of keywords that match  (coordination factor)
> - Then, within the documents that match the same number of keywords, sort
> first the documents that match a user value (country) and then the rest.
> - Then within those two blocks, sort by a document value (popularity).
>
> I have managed to make the second and third criteria to work, with a query
> like this:
>
> http://localhost:8983/solr/select/?q=description%3Afootball&version=2.2&start=0&rows=10&indent=on&qq=country_uk:true&sort=map%28query%28$qq,-1%29,0,9999999,1%29%20desc,popularity%20desc
>
> This gets with the query function a positive value for the documents that
> match the country, and a negative for the ones that don't, and then maps
> those ones to 1, so I have two blocks of documents with sorting value of 1
> and -1, which works for me cause ties are then sorted by popularity.  But as
> you see, this is only searching for 1 keyword.
>
> My problem comes with the first requirement when we search for more than one
> keyword, because as I understand, I would like to sort by the coordination
> factor, which is the number of query keywords that each document matches.
> The problem is that there's no Function Query I can use to get that value,
> so I don't know how to proceed. I was trying to understand if there was a
> way to split the regular score into sets which should mean that the same
> number of keywords was matched, but the score depends on different things,
> and the range of values can be arbitrary, so I'm not able to make such a
> function.
>
> Is there any solution to this?
>
> Thanks,
>
> Jesus.
>