You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Rudi Seitz (Jira)" <ji...@apache.org> on 2023/02/13 14:33:00 UTC

[jira] [Comment Edited] (SOLR-16652) multi-term synonym rule applied at query time prevents single-term matching

    [ https://issues.apache.org/jira/browse/SOLR-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687959#comment-17687959 ] 

Rudi Seitz edited comment on SOLR-16652 at 2/13/23 2:32 PM:
------------------------------------------------------------

If the original rule is "foo bar,baz" I believe Mikhail's suggestion to convert it to a directional rule would work, but we'd need two of them:

foo bar=>baz,foo,bar

baz=>baz,foo bar


was (Author: JIRAUSER297477):
If the original rule is "foo bar,baz" I believe Mikhail's suggestion to convert it to a directional rule would work, but we'd need two of them:

foo bar=>baz,foo,bar

baz=>foo bar

> multi-term synonym rule applied at query time prevents single-term matching
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-16652
>                 URL: https://issues.apache.org/jira/browse/SOLR-16652
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 9.1
>            Reporter: Rudi Seitz
>            Priority: Major
>
> The presence of a multi-term synonym equivalence rule applied at query time prevents matching on individual terms in the synonym.
> If we issue an edismax query against a text_general field in Solr 9.1, and the query string is "foo bar," we can match documents that have "foo" without "bar" and vice versa. However, if there is a synonym rule like "foo bar,baz" applied at query time, we no longer get single-term matches against "foo" or "bar." Both terms are now required, but can occur in any position: a document can match the query if it contains "foo bar" or "bar foo" or "bar qux foo", for example, but not if it only contains "foo".
> However, if we change the text_general analysis chain to apply synonyms at index time, the observed behavior also changes and single-term matches for "foo" or "bar" are again possible.
> Why is this an issue? 1) it is counterintuitive that a synonym equivalence (as opposed to a unidirectional mapping) would give narrower recall than without the rule, 2) this behavior represents a discrepancy in semantics between index-time and query-time synonym expansion.
>  
> *STEPS TO REPRODUCE*
> Use the _default configset with "foo bar,baz" added to synonyms.txt. Index these four docs:
>  
> {"id":"1", "title_txt":"foo"}
>  
> {"id":"2", "title_txt":"bar"}
>  
> {"id":"3", "title_txt":"foo bar"}
>  
> {"id":"4", "title_txt":"bar foo"}
>  
>  
> Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
> Result: Only docs 3 and 4 come back
>  
> Issue a query for "bar foo"
> Result: All four docs come back; the synonym rule is not invoked
>  
> *OBSERVATIONS*
> Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this would mean that a query for "foo" could now match a document containing only "bar", which is not the intent of the original rule.
> Note that we could set sow=true but this would prevent the multi-term synonym from taking effect: the "foo bar" query could now get single-term matches on "foo" or "bar" but couldn't get a match on the synonym "baz"
>  
> Returning to the original "foo bar,baz" synonym rule with sow=false, if we look at the explain output for the "foo bar" query we see:
> {{+((title_txt:baz (+title_txt:foo +title_txt:bar)))}}
>  
> Looking at the explain output for "bar foo" we see:
> {{+((title_txt:bar) (title_txt:foo))}}
>  
> So, the observed behavior makes sense according to the low-level query structure, but is still counterintuitive for the reasons described above.
>  
> Why not expand the "foo bar" query like this instead?
>  
> {{+((title_txt:baz (title_txt:foo title_txt:bar)))}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org