You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2013/10/16 16:43:46 UTC

[jira] [Created] (LUCENE-5288) Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur "close" together (in proximity) in each document

Michael McCandless created LUCENE-5288:
------------------------------------------

             Summary: Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur "close" together (in proximity) in each document
                 Key: LUCENE-5288
                 URL: https://issues.apache.org/jira/browse/LUCENE-5288
             Project: Lucene - Core
          Issue Type: New Feature
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 4.6, 5.0


This is very much a work in progress, tons of nocommits...  It adds two classes:

  * ProxBooleanTermQuery: like BooleanQuery (currently, all clauses
    must be TermQuery, and only Occur.SHOULD is supported), which is
    essentially a BooleanQuery (same matching/scoring) except for each
    matching docs the positions are merge-sorted and scored to "boost"
    the document's score

  * QueryRescorer: simple API to re-score top hits using a different
    query.  Because ProxBooleanTermQuery is so costly, apps would
    normally run an "ordinary" BooleanQuery across the full index, to
    get the top few hundred hits, and then rescore using the more
    costly ProxBooleanTermQuery (or other costly queries).

I'm not sure how to actually compute the appropriate prox boost (this
is the hard part!!) and I've completely punted on that in the current
patch (it's just a hack now), but the patch does all the "mechanics"
to merge/visit all the positions in order per hit.

Maybe we could do the similar scoring that SpanNearQuery or sloppy
PhraseQuery would do, or maybe this paper:

  http://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf

which Rob also used in LUCENE-4909 to add proximity scoring to
PostingsHighlighter.  Maybe we need to make it (how the prox boost is
computed/folded in) somehow pluggable ...




--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org