You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Claude Lepere <cl...@gmail.com> on 2022/02/21 09:56:18 UTC

Custom scores and sort

Hi! I have a question with sorting, I don’t understand why in a test a hit
with a lower score is ranked before hits with higher scores.

I am using Lucene 5.2.1.



Two CustomScoreQuery subqueries on two fields, subquery 1 and subquery 2,
and two test cases:

case 1: the two calculated custom scores are multiplied by the same factor
depending on the date of the match at the end of the customScore method of
CustomScoreProvider

case 2: the two calculated custom scores are *not* multiplied by the date
factor.



All tests with the same Sort, by score then by date.



Case 1: with date factor:



Test 1: subquery 1 only:

two hits, doc A (date A) gets the score A1, doc B (date B) gets the score
B1: score A1 > score B1, date A < date B, and doc A is ranked before doc B

Explanation:

doc A score A1 shardIndex=0 fields=[score A1, date A]

doc B score B1 shardIndex=0 fields=[score B1, date B]



That's correct.





Test 2: MUST query subquery 1, subquery 2:

the two same docs match: doc A (date A) gets the score A2, doc B (date B)
gets the score B2: score A2 *<* score B2, date A < date B, and *doc A is
ranked before doc B*

Explanation:

doc A score A2 shardIndex=0 fields=[score A1, date A]

doc B score B2 shardIndex=0 fields=[score B1, date B]



*doc A is ranked before doc B although score A2 < score B2 and sorting
should use scores A2 and B2, not A1 and B1.*







Case 2: without date factor:



Test 1: subquery 1 only:

doc A (date A) gets the score A1, doc B (date B) gets the score B1: score
A1 > score B1, date A < date B, and doc A is ranked before doc B

Explanation:

doc A score A1 shardIndex=0 fields=[score A1, date A]

doc B score B1 shardIndex=0 fields=[score B1, date B]





Test 2: MUST query subquery 1, subquery 2:

the two same docs match: doc A (date A) gets the score A2, doc B (date B)
gets the score B2: score A2 *>* score B2, date A < date B, and doc A is
ranked before doc B

Explanation:

doc A score A2 shardIndex=0 fields=[score A1, date A]

doc B score B2 shardIndex=0 fields=[score B1, date B]



Using score A1 here works: without the date factor, all the hits of test 2
match subquery 2 in the same way and they get the same sub-score: the
explanation shows in this case that the score = field[0] score + the common
sub-score of the hits, therefore the sorting is the same by current score
as by field[0] score.



But, with the date factor, this is no longer true, the sort [Score, date]
should use the current scores of test 2 and not those of test 1.





Please, could someone enlighten me? Do I make a mistake somewhere?



Claude Lepère

<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Custom scores and sort

Posted by pa...@hotmail.com.
Hello Claude,

here is what I'm doing and it seems to work, I haven't yet created
failure tests. Maybe more expert member will have more information.

Date field inserted:
final Date parse = DATE_FORMAT.parse(DATE_FORMAT.format(o1));
new LongPoint(attributeName, parse.getTime()));


The sorter:
Sort sort = new Sort(SortField.FIELD_SCORE, new SortField(LAST_UPDATE,
SortField.Type.STRING));

The query:
TopDocs docs = searcher.search(q, maxCount, sort); 



The records are inserted with 1 sec delay (for tests purposes only)

Stephane


-----Original Message-----
From: Claude Lepere <cl...@gmail.com>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Custom scores and sort
Date: Mon, 21 Feb 2022 10:56:18 +0100

Hi! I have a question with sorting, I don’t understand why in a test a
hitwith a lower score is ranked before hits with higher scores.
I am using Lucene 5.2.1.


Two CustomScoreQuery subqueries on two fields, subquery 1 and subquery
2,and two test cases:
case 1: the two calculated custom scores are multiplied by the same
factordepending on the date of the match at the end of the customScore
method ofCustomScoreProvider
case 2: the two calculated custom scores are *not* multiplied by the
datefactor.


All tests with the same Sort, by score then by date.


Case 1: with date factor:


Test 1: subquery 1 only:
two hits, doc A (date A) gets the score A1, doc B (date B) gets the
scoreB1: score A1 > score B1, date A < date B, and doc A is ranked
before doc B
Explanation:
doc A score A1 shardIndex=0 fields=[score A1, date A]
doc B score B1 shardIndex=0 fields=[score B1, date B]


That's correct.




Test 2: MUST query subquery 1, subquery 2:
the two same docs match: doc A (date A) gets the score A2, doc B (date
B)gets the score B2: score A2 *<* score B2, date A < date B, and *doc
A isranked before doc B*
Explanation:
doc A score A2 shardIndex=0 fields=[score A1, date A]
doc B score B2 shardIndex=0 fields=[score B1, date B]


*doc A is ranked before doc B although score A2 < score B2 and
sortingshould use scores A2 and B2, not A1 and B1.*






Case 2: without date factor:


Test 1: subquery 1 only:
doc A (date A) gets the score A1, doc B (date B) gets the score B1:
scoreA1 > score B1, date A < date B, and doc A is ranked before doc B
Explanation:
doc A score A1 shardIndex=0 fields=[score A1, date A]
doc B score B1 shardIndex=0 fields=[score B1, date B]




Test 2: MUST query subquery 1, subquery 2:
the two same docs match: doc A (date A) gets the score A2, doc B (date
B)gets the score B2: score A2 *>* score B2, date A < date B, and doc A
isranked before doc B
Explanation:
doc A score A2 shardIndex=0 fields=[score A1, date A]
doc B score B2 shardIndex=0 fields=[score B1, date B]


Using score A1 here works: without the date factor, all the hits of
test 2match subquery 2 in the same way and they get the same sub-
score: theexplanation shows in this case that the score = field[0]
score + the commonsub-score of the hits, therefore the sorting is the
same by current scoreas by field[0] score.


But, with the date factor, this is no longer true, the sort [Score,
date]should use the current scores of test 2 and not those of test 1.




Please, could someone enlighten me? Do I make a mistake somewhere?


Claude Lepère
<
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>Virus-free.www.avg.com
<
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail><#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2
>