You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/12/01 22:05:00 UTC

[jira] [Commented] (SOLR-11662) Make overlapping query term scoring configurable per field type

    [ https://issues.apache.org/jira/browse/SOLR-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275057#comment-16275057 ] 

ASF GitHub Bot commented on SOLR-11662:
---------------------------------------

Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/275#discussion_r154456181
  
    --- Diff: solr/core/src/test/org/apache/solr/search/TestSolrQueryParser.java ---
    @@ -1057,7 +1057,35 @@ public void testShingleQueries() throws Exception {
             , "/response/numFound==1"
         );
       }
    -  
    +
    +
    +  public void testSynonymQueryStyle() throws Exception {
    +    ModifiableSolrParams edismaxParams = params("qf", "t_pick_best_foo");
    +
    +    QParser qParser = QParser.getParser("tabby", "edismax", req(edismaxParams));
    --- End diff --
    
    Why not the default/lucene query parser?  That's what TestSolrQueryParser tests.


> Make overlapping query term scoring configurable per field type
> ---------------------------------------------------------------
>
>                 Key: SOLR-11662
>                 URL: https://issues.apache.org/jira/browse/SOLR-11662
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Doug Turnbull
>             Fix For: 7.2, master (8.0)
>
>
> This patch customizes the query-time behavior when query terms overlap positions. Right now the only option is SynonymQuery. This is a fantastic default & improvement on past versions. However, there are use cases where terms overlap positions but don't carry exact synonymy relationships. Often synonyms are actually used to model hypernym/hyponym relationships using synonyms (or other analyzers). So the individual term scores matter, with terms with higher specificity (hyponym) scoring higher than terms with lower specificity (hypernym).
> This patch adds the fieldType setting scoreOverlaps, as in:
> {code:java}
>   <fieldType name="text_general"  scoreOverlaps="pick_best"  class="solr.TextField" positionIncrementGap="100" multiValued="true">
> {code}
> Valid values for scoreOverlaps are:
> *as_one_term*
> Default, most synonym use cases. Uses SynonymQuery
> Treats all terms as if they're exactly equivalent, with document frequency from underlying terms blended 
> *pick_best*
> For a given document, score using the best scoring synonym (ie dismax over generated terms). 
> Useful when synonyms not exactly equilevant. Instead they are used to model hypernym/hyponym relationships. Such as expanding to synonyms of where terms scores will reflect that quality
> IE this query time expansion
> tabby => tabby, cat, animal
> Searching "text", generates the dismax (text:tabby | text:cat | text:animal)
> *as_distinct_terms*
> (The pre 6.0 behavior.)
> Compromise between pick_best and as_oneSterm
> Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets scores stack, so documents with more tabby, cat, or animal the better w/ a bias towards the term with highest specificity
> Terms are turned into a boolean OR query, with documen frequencies not blended
> IE this query time expansion
> tabby => tabby, cat, animal
> Searching "text", generates the boolean query (text:tabby  text:cat text:animal)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org