You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2018/09/03 15:59:01 UTC
[jira] [Commented] (LUCENE-8340) Allow to boost by recency

    [ https://issues.apache.org/jira/browse/LUCENE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602299#comment-16602299 ] 

Adrien Grand commented on LUCENE-8340:
--------------------------------------

So I went back to this patch and did some testing. I played with the wikimedium10m dataset and the following query (note that I had to do a hack to also index "lastModNDV" with a LongPoint):
{code:java}
Query boostedQ = new BooleanQuery.Builder()
		.add(new TermQuery(new Term("body", "ref")), Occur.MUST)
		.add(LongPoint.newDistanceFeatureQuery("lastModNDV", 1f, 1335997132000L, 24 * 3600 * 1000), Occur.SHOULD) // within 1 day
		.build();
{code}
The maximum score of the term query is 2.07. The maximum score of the distance query is 1, and there are 582,764 documents whose timestamp is in [1335997132000L - 24 * 3600 * 1000, 1335997132000L + 24 * 3600 * 1000], meaning their score is in [0.5, 1].

When computing the top 10 matches and counting hits, all 3793973 hits must be visited and points are never read. This takes about 99ms.
When computing the top 10 matches but not counting hits (totalHitsThreshold=1), only 264802 hits are collected (7% of matches) and the query runs in 29ms.

If I switch to more costly queries that have fewer hits then the speed up decreases, or even becomes a slowdown unfortunately. That said I don't think it should prevent us from adding something like that, which is a useful addition to the scoring toolbox.

> Allow to boost by recency
> -------------------------
>
>                 Key: LUCENE-8340
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8340
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8340.patch
>
>
> I would like that we support something like \{{FeatureField.newSaturationQuery}} but that works with features that are computed dynamically like recency or geo-distance, and is still optimized for top-hits collection. I'm starting with recency because it makes things a bit easier even though I suspect that geo-distance might be a more common need.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org