You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2013/10/16 16:43:46 UTC
[jira] [Created] (LUCENE-5288) Add ProxBooleanTermQuery, like
BooleanQuery but boosting when term occur "close" together (in proximity)
in each document
Michael McCandless created LUCENE-5288:
------------------------------------------
Summary: Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur "close" together (in proximity) in each document
Key: LUCENE-5288
URL: https://issues.apache.org/jira/browse/LUCENE-5288
Project: Lucene - Core
Issue Type: New Feature
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 4.6, 5.0
This is very much a work in progress, tons of nocommits... It adds two classes:
* ProxBooleanTermQuery: like BooleanQuery (currently, all clauses
must be TermQuery, and only Occur.SHOULD is supported), which is
essentially a BooleanQuery (same matching/scoring) except for each
matching docs the positions are merge-sorted and scored to "boost"
the document's score
* QueryRescorer: simple API to re-score top hits using a different
query. Because ProxBooleanTermQuery is so costly, apps would
normally run an "ordinary" BooleanQuery across the full index, to
get the top few hundred hits, and then rescore using the more
costly ProxBooleanTermQuery (or other costly queries).
I'm not sure how to actually compute the appropriate prox boost (this
is the hard part!!) and I've completely punted on that in the current
patch (it's just a hack now), but the patch does all the "mechanics"
to merge/visit all the positions in order per hit.
Maybe we could do the similar scoring that SpanNearQuery or sloppy
PhraseQuery would do, or maybe this paper:
http://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf
which Rob also used in LUCENE-4909 to add proximity scoring to
PostingsHighlighter. Maybe we need to make it (how the prox boost is
computed/folded in) somehow pluggable ...
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org