You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Gründler <ro...@dubture.com> on 2010/11/24 13:13:48 UTC

Is this sort order possible in a single query?

Hi,

we have a requirement for one of our search results which has a quite complex sorting strategy. Let me explain the document first, using an example:

The document is a book. It has several indexed text fields: Title, Author, Distributor. It has two integer columns, where one reflects the number of sold copies (num_copies), and the other reflects
the number of comments on the website (num_comments).

The Requirement for the relevancy looks like this:

* Documents which have exact matches in the "Author" field, should be ranked highest, disregarding their values in "num_copies" and "num_comments" fields  
* After the exact matches, the sorting should be based on the value in the field "num_copies", but only for documents, where this field is set
* After the num_copies matches, the sorting should be based on "num_comments"

I'm wondering is this kind of sort order can be implemented in a single query, or if i need to break it down into several queries and merge the results on application level.

-robert



Re: Is this sort order possible in a single query?

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.
> I can't see a way to do it without functionqueries at the moment, which
> doesn't mean there isn't any.

If you want to use the suggested sort method, you could probably sort first by score:
sort=score desc, num_copies desc, num_comments desc
To let the score be influenced by exact author match only, you can set all other components to 0 (disMax):
qf=author^0.0 title^0.0 author_exact


Personally I doubt that the requirements will give an excellent usability experience. If you do NOT get an exact author match, you will get a "bestseller" list, not taking into account your terms in the ranking at all.

Unless there are specific reasons for these requirements I'd recommend using rank boosts instead of simple sorting.

Boost author_exact very high using DisMax combined with a version of the author field with KeywordTokenizerFactory and LowerCaseFilterFactory, perhaps combined with PatternReplaceFilterFactory to normalize punctuation etc.

 <fieldType name="author_exact" class="solr.TextField">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z0-9])" replacement=" " replace="all" />
    </analyzer>
 </fieldType>

Then boost num_copies and num_comments using FunctionQueries in such a way that sold copies count more than comments.

Example:
http://localhost:8093/solr/select?defType=dismax&author:(j.k. rowling)&qf=author^5.0 author_exact^1000.0 title^10.0 descr^0.2&fq=log(sum(num_copies,1))^1000.0 log(sum(num_comments,1))^100.0

Also a hint for this kind of fields is to disable field normalization (omitNorms="true")

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


Re: Is this sort order possible in a single query?

Posted by Geert-Jan Brits <gb...@gmail.com>.
hmm, sorry about that. I haven't used the 'sort by functionquery'-option
myself, but I remembered it existed.
Indeed solr 1.5 was never released (as you've read in the link you pointed
out)

the relevant JIRA-issue: https://issues.apache.org/jira/browse/SOLR-1297

<https://issues.apache.org/jira/browse/SOLR-1297>There's some recent
activity and a final post suggesting the patch works. (assumingly under
either 3.1 and/or 4.x)
Both branches are not released at the moment though, although 3.1 should be
pretty close (and perhaps stable enough) . I'm just not sure.

Your best bet is to start a new thread asking at what branch to patch
SOLR-1297 <https://issues.apache.org/jira/browse/SOLR-1297> and asking the
subjective 'is it stable enough?'.

Hope that helps some,
Geert-Jan


2010/11/24 Robert Gründler <ro...@dubture.com>

> thanks a lot for the explanation. i'm a little confused about solr 1.5,
> especially
> after finding this wiki page:
>
> http://wiki.apache.org/solr/Solr1.5
>
> Is there a stable build available for version 1.5, so i can test your
> suggestion
> using functionquery?
>
>
> -robert
>
>
>
> On Nov 24, 2010, at 1:53 PM, Geert-Jan Brits wrote:
>
> > You could do it with sorting on a functionquery (which is supported from
> > solr 1.5)
> > http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
> > <http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function>
> > Consider the search:
> > http://localhost:8093/solr/select?author:'j.k.rowling'
> >
> > sorting like you specified would involve:
> >
> > 1. introducing an extra field: 'author_exact' of type 'string' which
> takes
> > care of the exact matching. (You can populate it by defining it as a
> > copyfield of Author so your indexing-code doesn't change)
> > 2. set sortMissingLast="true" for 'num_copies' and 'num_comments'
> > like:  <fieldType
> > name="num_copies" sorMissingLast="true".... >
> >
> > this makes sure that documents which don't have the value set end up at
> the
> > end of the sort when sorted on that particular field.
> >
> > 3. construct a functionquery that scores either 0 (no match)  or x (not
> sure
> > what x is (1?) , but it should always be the same for all exact matches )
> >
> > This gives
> >
> >
> http://localhost:8093/solr/select?author:'j.k.rowling'&sort=query({!dismaxqf=author_exact
> > v='j.k.rowling'}) desc
> >
> > which scores all exact matches before all partial matches.
> >
> > 4. now just concatenate the other sorts giving:
> >
> >
> http://localhost:8093/solr/select?author:'j.k.rowling'&sort=query({!dismaxqf=author_exact
> > v='j.k.rowling'}) desc, num_copies desc, num_comments desc
> >
> > That should do it.
> >
> > Please note that 'num_copies' and 'num_comments' still kick in to break
> the
> > tie for documents that exactly match on 'author_exact'. I assume this is
> > ok.
> >
> > I can't see a way to do it without functionqueries at the moment, which
> > doesn't mean there isn't any.
> >
> > Hope that helps,
> >
> > Geert-Jan
> >
> >
> >
> >
> >
> >
> >
> > *query({!dismax qf=text v='solr rocks'})*
> > *
> > *
> >
> >
> >
> >
> > 2010/11/24 Robert Gründler <ro...@dubture.com>
> >
> >> Hi,
> >>
> >> we have a requirement for one of our search results which has a quite
> >> complex sorting strategy. Let me explain the document first, using an
> >> example:
> >>
> >> The document is a book. It has several indexed text fields: Title,
> Author,
> >> Distributor. It has two integer columns, where one reflects the number
> of
> >> sold copies (num_copies), and the other reflects
> >> the number of comments on the website (num_comments).
> >>
> >> The Requirement for the relevancy looks like this:
> >>
> >> * Documents which have exact matches in the "Author" field, should be
> >> ranked highest, disregarding their values in "num_copies" and
> "num_comments"
> >> fields
> >> * After the exact matches, the sorting should be based on the value in
> the
> >> field "num_copies", but only for documents, where this field is set
> >> * After the num_copies matches, the sorting should be based on
> >> "num_comments"
> >>
> >> I'm wondering is this kind of sort order can be implemented in a single
> >> query, or if i need to break it down into several queries and merge the
> >> results on application level.
> >>
> >> -robert
> >>
> >>
> >>
>
>

Re: Is this sort order possible in a single query?

Posted by Robert Gründler <ro...@dubture.com>.
thanks a lot for the explanation. i'm a little confused about solr 1.5, especially
after finding this wiki page:

http://wiki.apache.org/solr/Solr1.5

Is there a stable build available for version 1.5, so i can test your suggestion
using functionquery?


-robert



On Nov 24, 2010, at 1:53 PM, Geert-Jan Brits wrote:

> You could do it with sorting on a functionquery (which is supported from
> solr 1.5)
> http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
> <http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function>
> Consider the search:
> http://localhost:8093/solr/select?author:'j.k.rowling'
> 
> sorting like you specified would involve:
> 
> 1. introducing an extra field: 'author_exact' of type 'string' which takes
> care of the exact matching. (You can populate it by defining it as a
> copyfield of Author so your indexing-code doesn't change)
> 2. set sortMissingLast="true" for 'num_copies' and 'num_comments'
> like:  <fieldType
> name="num_copies" sorMissingLast="true".... >
> 
> this makes sure that documents which don't have the value set end up at the
> end of the sort when sorted on that particular field.
> 
> 3. construct a functionquery that scores either 0 (no match)  or x (not sure
> what x is (1?) , but it should always be the same for all exact matches )
> 
> This gives
> 
> http://localhost:8093/solr/select?author:'j.k.rowling'&sort=query({!dismaxqf=author_exact
> v='j.k.rowling'}) desc
> 
> which scores all exact matches before all partial matches.
> 
> 4. now just concatenate the other sorts giving:
> 
> http://localhost:8093/solr/select?author:'j.k.rowling'&sort=query({!dismaxqf=author_exact
> v='j.k.rowling'}) desc, num_copies desc, num_comments desc
> 
> That should do it.
> 
> Please note that 'num_copies' and 'num_comments' still kick in to break the
> tie for documents that exactly match on 'author_exact'. I assume this is
> ok.
> 
> I can't see a way to do it without functionqueries at the moment, which
> doesn't mean there isn't any.
> 
> Hope that helps,
> 
> Geert-Jan
> 
> 
> 
> 
> 
> 
> 
> *query({!dismax qf=text v='solr rocks'})*
> *
> *
> 
> 
> 
> 
> 2010/11/24 Robert Gründler <ro...@dubture.com>
> 
>> Hi,
>> 
>> we have a requirement for one of our search results which has a quite
>> complex sorting strategy. Let me explain the document first, using an
>> example:
>> 
>> The document is a book. It has several indexed text fields: Title, Author,
>> Distributor. It has two integer columns, where one reflects the number of
>> sold copies (num_copies), and the other reflects
>> the number of comments on the website (num_comments).
>> 
>> The Requirement for the relevancy looks like this:
>> 
>> * Documents which have exact matches in the "Author" field, should be
>> ranked highest, disregarding their values in "num_copies" and "num_comments"
>> fields
>> * After the exact matches, the sorting should be based on the value in the
>> field "num_copies", but only for documents, where this field is set
>> * After the num_copies matches, the sorting should be based on
>> "num_comments"
>> 
>> I'm wondering is this kind of sort order can be implemented in a single
>> query, or if i need to break it down into several queries and merge the
>> results on application level.
>> 
>> -robert
>> 
>> 
>> 


Re: Is this sort order possible in a single query?

Posted by Geert-Jan Brits <gb...@gmail.com>.
You could do it with sorting on a functionquery (which is supported from
solr 1.5)
http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
<http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function>
Consider the search:
http://localhost:8093/solr/select?author:'j.k.rowling'

sorting like you specified would involve:

1. introducing an extra field: 'author_exact' of type 'string' which takes
care of the exact matching. (You can populate it by defining it as a
copyfield of Author so your indexing-code doesn't change)
2. set sortMissingLast="true" for 'num_copies' and 'num_comments'
like:  <fieldType
name="num_copies" sorMissingLast="true".... >

this makes sure that documents which don't have the value set end up at the
end of the sort when sorted on that particular field.

3. construct a functionquery that scores either 0 (no match)  or x (not sure
what x is (1?) , but it should always be the same for all exact matches )

This gives

http://localhost:8093/solr/select?author:'j.k.rowling'&sort=query({!dismaxqf=author_exact
v='j.k.rowling'}) desc

which scores all exact matches before all partial matches.

4. now just concatenate the other sorts giving:

http://localhost:8093/solr/select?author:'j.k.rowling'&sort=query({!dismaxqf=author_exact
v='j.k.rowling'}) desc, num_copies desc, num_comments desc

That should do it.

Please note that 'num_copies' and 'num_comments' still kick in to break the
tie for documents that exactly match on 'author_exact'. I assume this is
ok.

I can't see a way to do it without functionqueries at the moment, which
doesn't mean there isn't any.

Hope that helps,

Geert-Jan







*query({!dismax qf=text v='solr rocks'})*
*
*




2010/11/24 Robert Gründler <ro...@dubture.com>

> Hi,
>
> we have a requirement for one of our search results which has a quite
> complex sorting strategy. Let me explain the document first, using an
> example:
>
> The document is a book. It has several indexed text fields: Title, Author,
> Distributor. It has two integer columns, where one reflects the number of
> sold copies (num_copies), and the other reflects
> the number of comments on the website (num_comments).
>
> The Requirement for the relevancy looks like this:
>
> * Documents which have exact matches in the "Author" field, should be
> ranked highest, disregarding their values in "num_copies" and "num_comments"
> fields
> * After the exact matches, the sorting should be based on the value in the
> field "num_copies", but only for documents, where this field is set
> * After the num_copies matches, the sorting should be based on
> "num_comments"
>
> I'm wondering is this kind of sort order can be implemented in a single
> query, or if i need to break it down into several queries and merge the
> results on application level.
>
> -robert
>
>
>