You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Rudi Seitz (Jira)" <ji...@apache.org> on 2023/02/10 21:10:00 UTC

[jira] [Updated] (SOLR-16652) multi-term synonym rule applied at query time prevents single-term matching

     [ https://issues.apache.org/jira/browse/SOLR-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rudi Seitz updated SOLR-16652:
------------------------------
    Description: 
The presence of a multi-term synonym equivalence rule applied at query time prevents matching on individual terms in the synonym.

If we issue an edismax query against a text_general field in Solr 9.1, and the query string is "foo bar," we can match documents that have "foo" without "bar" and vice versa. However, if there is a synonym rule like "foo bar,baz" applied at query time, we no longer get single-term matches against "foor" or "bar." Both terms are now required, but can occur in any position: a document can match the query if it contains "foo bar" or "bar foo" or "bar qux foo", for example, but not if it only contains "foo".

However, if we change the text_general analysis chain to apply synonyms at index time, the observed behavior also changes and single-term matches for "foo" or "bar" are again possible.

Why is this an issue? 1) it is counterintuitive that a synonym equivalence (as opposed to a unidirectional mapping) would give narrower recall than without the rule, 2) this behavior represents a discrepancy in semantics between index-time and query-time synonym expansion.

 

*STEPS TO REPRODUCE*

Use the _default configset with "foo bar,baz" added to synonyms.txt. Index these four docs:

{"id":"1", "title_txt":"foo"} \{"id":"2", "title_txt":"bar"} \{"id":"3", "title_txt":"foo bar"} \{"id":"4", "title_txt":"bar foo"}

 
Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
Result: Only docs 3 and 4 come back
 
Issue a query for "bar foo"
Result: All four docs come back; the synonym rule is not invoked
 

*OBSERVATIONS*

Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this would mean that a query for "foo" could now match a document containing only "bar", which is not the intent of the original rule.

Note that we could set sow=true but this would prevent the multi-term synonym from taking effect: the "foo bar" query could now get single-term matches on "foo" or "bar" but couldn't get a match on the synonym "baz"
 
Returning to the original "foo bar,baz" synonym rule with sow=false, if we look at the explain output for the "foo bar" query we see:

{{+((title_txt:baz (+title_txt:foo +title_txt:bar)))}}
 
Looking at the explain output for "bar foo" we see:

{{+((title_txt:bar) (title_txt:foo))}}
 
So, the observed behavior makes sense according to the low-level query structure, but is still counterintuitive for the reasons described above.
 
Why not expand the "foo bar" query like this instead?
 
{{+((title_txt:baz (title_txt:foo title_txt:bar)))}}
 

 

 

  was:
The presence of a multi-term synonym equivalence rule applied at query time prevents matching on individual terms in the synonym.

If we issue an edismax query against a text_general field in Solr 9.1, and the query string is "foo bar," we can match documents that have "foo" without "bar" and vice versa. However, if there is a synonym rule like "foo bar,baz" applied at query time, we no longer get single-term matches against "foor" or "bar." Both terms are now required, but can occur in any position: a document can match the query if it contains "foo bar" or "bar foo" or "bar qux foo", for example, but not if it only contains "foo".

However, if we change the text_general analysis chain to apply synonyms at index time, the observed behavior also changes and single-term matches for "foo" or "bar" are again possible.

Why is this an issue? 1) it is counterintuitive that a synonym equivalence (as opposed to a unidirectional mapping) would give narrower recall than without the rule, 2) this behavior represents a discrepancy in semantics between index-time and query-time synonym expansion.

STEPS TO REPRODUCE

Use the _default configset with "foo bar,baz" added to synonyms.txt. Index these four docs:

{"id":"1", "title_txt":"foo"}
{"id":"2", "title_txt":"bar"}
{"id":"3", "title_txt":"foo bar"}
{"id":"4", "title_txt":"bar foo"}
 
Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
Result: Only docs 3 and 4 come back
 
Issue a query for "bar foo"
Result: All four docs come back; the synonym rule is not invoked
 

OBSERVATIONS:

Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this would mean that a query for "foo" could now match a document containing only "bar", which is not the intent of the original rule.

Note that we could set sow=true but this would prevent the multi-term synonym from taking effect: the "foo bar" query could now get single-term matches on "foo" or "bar" but couldn't get a match on the synonym "baz"
 
Returning to the original "foo bar,baz" synonym rule with sow=false, if we look at the explain output for the "foo bar" query we see:

{{+((title_txt:baz (+title_txt:foo +title_txt:bar)))}}
 
Looking at the explain output for "bar foo" we see:

{{+((title_txt:bar) (title_txt:foo))}}
 
So, the observed behavior makes sense according to the low-level query structure, but is still counterintuitive for the reasons described above.
 
Why not expand the "foo bar" query like this instead?
 
{{+((title_txt:baz (title_txt:foo title_txt:bar))){color:#888888}
{color}}}
 

 

 


> multi-term synonym rule applied at query time prevents single-term matching
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-16652
>                 URL: https://issues.apache.org/jira/browse/SOLR-16652
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 9.1
>            Reporter: Rudi Seitz
>            Priority: Major
>
> The presence of a multi-term synonym equivalence rule applied at query time prevents matching on individual terms in the synonym.
> If we issue an edismax query against a text_general field in Solr 9.1, and the query string is "foo bar," we can match documents that have "foo" without "bar" and vice versa. However, if there is a synonym rule like "foo bar,baz" applied at query time, we no longer get single-term matches against "foor" or "bar." Both terms are now required, but can occur in any position: a document can match the query if it contains "foo bar" or "bar foo" or "bar qux foo", for example, but not if it only contains "foo".
> However, if we change the text_general analysis chain to apply synonyms at index time, the observed behavior also changes and single-term matches for "foo" or "bar" are again possible.
> Why is this an issue? 1) it is counterintuitive that a synonym equivalence (as opposed to a unidirectional mapping) would give narrower recall than without the rule, 2) this behavior represents a discrepancy in semantics between index-time and query-time synonym expansion.
>  
> *STEPS TO REPRODUCE*
> Use the _default configset with "foo bar,baz" added to synonyms.txt. Index these four docs:
> {"id":"1", "title_txt":"foo"} \{"id":"2", "title_txt":"bar"} \{"id":"3", "title_txt":"foo bar"} \{"id":"4", "title_txt":"bar foo"}
>  
> Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
> Result: Only docs 3 and 4 come back
>  
> Issue a query for "bar foo"
> Result: All four docs come back; the synonym rule is not invoked
>  
> *OBSERVATIONS*
> Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this would mean that a query for "foo" could now match a document containing only "bar", which is not the intent of the original rule.
> Note that we could set sow=true but this would prevent the multi-term synonym from taking effect: the "foo bar" query could now get single-term matches on "foo" or "bar" but couldn't get a match on the synonym "baz"
>  
> Returning to the original "foo bar,baz" synonym rule with sow=false, if we look at the explain output for the "foo bar" query we see:
> {{+((title_txt:baz (+title_txt:foo +title_txt:bar)))}}
>  
> Looking at the explain output for "bar foo" we see:
> {{+((title_txt:bar) (title_txt:foo))}}
>  
> So, the observed behavior makes sense according to the low-level query structure, but is still counterintuitive for the reasons described above.
>  
> Why not expand the "foo bar" query like this instead?
>  
> {{+((title_txt:baz (title_txt:foo title_txt:bar)))}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org