You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2017/12/04 18:38:00 UTC
[jira] [Resolved] (SOLR-11662) Make overlapping query term scoring configurable per field type

     [ https://issues.apache.org/jira/browse/SOLR-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley resolved SOLR-11662.
---------------------------------
    Resolution: Fixed
      Assignee: David Smiley

Thanks Doug!

BTW I have a question on the practical use of this option.  In the docs you mention the default as_same_term is good for real synonyms and that the otherS are good for hyponyms.  Lets say the synonyms file has a mix of both (typical).  It seems impossible to use both since the QueryBuilder passes no context other than the terms to build the query.  Do you recommend different analyzer chains, one with regular synonyms and another with hypernyms via perhaps SOLR-11698?  Of course that'd be less efficient than one query with the right type of query per synonym clause; but that's elusive without some custom query parser that detects the types and handles it (not leveraging QueryBuilder as it's not hackable).

> Make overlapping query term scoring configurable per field type
> ---------------------------------------------------------------
>
>                 Key: SOLR-11662
>                 URL: https://issues.apache.org/jira/browse/SOLR-11662
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Doug Turnbull
>            Assignee: David Smiley
>             Fix For: 7.2, master (8.0)
>
>
> This patch customizes the query-time behavior when query terms overlap positions. Right now the only option is SynonymQuery. This is a fantastic default & improvement on past versions. However, there are use cases where terms overlap positions but don't carry exact synonymy relationships. Often synonyms are actually used to model hypernym/hyponym relationships using synonyms (or other analyzers). So the individual term scores matter, with terms with higher specificity (hyponym) scoring higher than terms with lower specificity (hypernym).
> This patch adds the fieldType setting scoreOverlaps, as in:
> {code:java}
>   <fieldType name="text_general"  scoreOverlaps="pick_best"  class="solr.TextField" positionIncrementGap="100" multiValued="true">
> {code}
> Valid values for scoreOverlaps are:
> *as_one_term*
> Default, most synonym use cases. Uses SynonymQuery
> Treats all terms as if they're exactly equivalent, with document frequency from underlying terms blended 
> *pick_best*
> For a given document, score using the best scoring synonym (ie dismax over generated terms). 
> Useful when synonyms not exactly equilevant. Instead they are used to model hypernym/hyponym relationships. Such as expanding to synonyms of where terms scores will reflect that quality
> IE this query time expansion
> tabby => tabby, cat, animal
> Searching "text", generates the dismax (text:tabby | text:cat | text:animal)
> *as_distinct_terms*
> (The pre 6.0 behavior.)
> Compromise between pick_best and as_oneSterm
> Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets scores stack, so documents with more tabby, cat, or animal the better w/ a bias towards the term with highest specificity
> Terms are turned into a boolean OR query, with documen frequencies not blended
> IE this query time expansion
> tabby => tabby, cat, animal
> Searching "text", generates the boolean query (text:tabby  text:cat text:animal)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org