You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Rudi Seitz <ru...@rudiseitz.com> on 2023/02/09 17:18:33 UTC

multi-term synonym prevents single-term match -- known issue?

Is this known behavior or is it worth a JIRA ticket?

Searching against a text_general field in Solr 9.1, if my edismax query is
"foo bar" I should be able to get matches for "foo" without "bar" and vice
versa. However, if there happens to be a synonym rule applied at query
time, like "foo bar,zzz" I can no longer get single-term matches against
"foo" or "bar." Both terms are now required, but can occur in either order.
If we change the text_general analysis chain to apply synonyms at index
time instead of query time, this behavior goes away and single-term matches
are again possible.

To reproduce, use the _default configset with "foo bar,zzz" added to
synonyms.txt. Index these four docs:

{"id":"1", "title_txt":"foo"}
{"id":"2", "title_txt":"bar"}
{"id":"3", "title_txt":"foo bar"}
{"id":"4", "title_txt":"bar foo"}

Issue a query for "foo bar" (i.e.
defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
Result: Only docs 3 and 4 come back

Issue a query for "bar foo"
Result: All four docs come back; the synonym rule is not invoked

Looking at the explain output for "foo bar" we see:

+((title_txt:zzz (+title_txt:foo +title_txt:bar)))


Looking at the explain output for "bar foo" we see:

+((title_txt:bar) (title_txt:foo))

So, the observed behavior makes sense according to the low-level query
structure. But -- is this how it's "supposed" to work?

Why not expand the "foo bar" query like this instead?

+((title_txt:zzz (title_txt:foo title_txt:bar)))

Rudi

Re: multi-term synonym prevents single-term match -- known issue?

Posted by Mikhail Khludnev <mk...@apache.org>.
Opened reproducer https://github.com/apache/lucene/pull/12157

On Mon, Feb 13, 2023 at 6:46 PM Mikhail Khludnev <mk...@apache.org> wrote:

> It's time to summon Lucene devs
> https://issues.apache.org/jira/browse/SOLR-16652?focusedCommentId=17687998&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17687998
>
> it seems by design
> https://github.com/apache/lucene/blob/main/lucene/queryparser/src/test/org/apache/lucene/queryparser/classic/TestQueryParser.java#L591
> It sets mw synonym: "guinea pig => cavy"
> dumb.parse("guinea pig") => ((+field:guinea +field:pig) field:cavy)
> Doesn't match just 'guinea' as expected in this ticket.
>
> On Mon, Feb 13, 2023 at 5:33 PM Rudi Seitz <ru...@rudiseitz.com> wrote:
>
>> Thanks Mikhail.
>> I think your directional approach ("foo bar=>baz,foo,bar") would work, but
>> we'd also need "baz=>baz,foo bar" for a complete workaround.
>> I've added your message as a comment on the ticket.
>> Rudi
>>
>> On Sat, Feb 11, 2023 at 12:34 PM Mikhail Khludnev <mk...@apache.org>
>> wrote:
>>
>> > Thanks for raising a ticket. Here are just two considerations:
>> > > we could change the synonym rule to "foo bar,baz,foo,bar" but this
>> would
>> > mean that a query for "foo" could now match a document containing only
>> > "bar", which is not the intent of the original rule.
>> > Ok. The later issue can be probably fixed by directing synonyms
>> > foo bar=>baz,foo,bar
>> > Right, It seems like a weird band aid.
>> >
>> > I stepped through lucene code, MUST occur for synonyms is defined
>> >
>> >
>> https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L534
>> > Presumably, original terms could go with defaultOperator, and synonym
>> > replacement keep MUST.
>> >
>> >
>> >
>> >
>> >
>> > On Sat, Feb 11, 2023 at 12:17 AM Rudi Seitz <ru...@rudiseitz.com> wrote:
>> >
>> > > Thanks Mikhail and Michael.
>> > > Based on your feedback, I created a ticket:
>> > > https://issues.apache.org/jira/browse/SOLR-16652
>> > > In the ticket, I mentioned why updating the synonym rule or setting
>> > > sow=true causes other problems in this case, unfortunately. I haven't
>> yet
>> > > looked through code to see where the behavior could be changed.
>> > > Rudi
>> > >
>> > >
>> > > On Fri, Feb 10, 2023 at 11:26 AM Michael Gibney <
>> > michael@michaelgibney.net
>> > > >
>> > > wrote:
>> > >
>> > > > Rudi,
>> > > >
>> > > > I agree, this does not seem like how it should behave. Probably
>> > > > something that could be fixed in edismax, not something lower-level
>> > > > (Lucene)?
>> > > >
>> > > > Michael
>> > > >
>> > > > On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev <mk...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > Hello, Rudi.
>> > > > > Well, it doesn't seem perfect. Probably it's can be fixed
>> > > > > via
>> > > > > foo bar,zzz,foo,bar
>> > > > > And in some sort of sense this behavior is reasonable.
>> > > > > Also you can experiment with sow and pf params (the later param is
>> > > > > described in dismax page only).
>> > > > >
>> > > > > On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz <ru...@rudiseitz.com>
>> > wrote:
>> > > > >
>> > > > > > Is this known behavior or is it worth a JIRA ticket?
>> > > > > >
>> > > > > > Searching against a text_general field in Solr 9.1, if my
>> edismax
>> > > > query is
>> > > > > > "foo bar" I should be able to get matches for "foo" without
>> "bar"
>> > and
>> > > > vice
>> > > > > > versa. However, if there happens to be a synonym rule applied at
>> > > query
>> > > > > > time, like "foo bar,zzz" I can no longer get single-term matches
>> > > > against
>> > > > > > "foo" or "bar." Both terms are now required, but can occur in
>> > either
>> > > > order.
>> > > > > > If we change the text_general analysis chain to apply synonyms
>> at
>> > > index
>> > > > > > time instead of query time, this behavior goes away and
>> single-term
>> > > > matches
>> > > > > > are again possible.
>> > > > > >
>> > > > > > To reproduce, use the _default configset with "foo bar,zzz"
>> added
>> > to
>> > > > > > synonyms.txt. Index these four docs:
>> > > > > >
>> > > > > > {"id":"1", "title_txt":"foo"}
>> > > > > > {"id":"2", "title_txt":"bar"}
>> > > > > > {"id":"3", "title_txt":"foo bar"}
>> > > > > > {"id":"4", "title_txt":"bar foo"}
>> > > > > >
>> > > > > > Issue a query for "foo bar" (i.e.
>> > > > > > defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
>> > > > > > Result: Only docs 3 and 4 come back
>> > > > > >
>> > > > > > Issue a query for "bar foo"
>> > > > > > Result: All four docs come back; the synonym rule is not invoked
>> > > > > >
>> > > > > > Looking at the explain output for "foo bar" we see:
>> > > > > >
>> > > > > > +((title_txt:zzz (+title_txt:foo +title_txt:bar)))
>> > > > > >
>> > > > > >
>> > > > > > Looking at the explain output for "bar foo" we see:
>> > > > > >
>> > > > > > +((title_txt:bar) (title_txt:foo))
>> > > > > >
>> > > > > > So, the observed behavior makes sense according to the low-level
>> > > query
>> > > > > > structure. But -- is this how it's "supposed" to work?
>> > > > > >
>> > > > > > Why not expand the "foo bar" query like this instead?
>> > > > > >
>> > > > > > +((title_txt:zzz (title_txt:foo title_txt:bar)))
>> > > > > >
>> > > > > > Rudi
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Sincerely yours
>> > > > > Mikhail Khludnev
>> > > > > https://t.me/MUST_SEARCH
>> > > > > A caveat: Cyrillic!
>> > > >
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> > https://t.me/MUST_SEARCH
>> > A caveat: Cyrillic!
>> >
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: multi-term synonym prevents single-term match -- known issue?

Posted by Mikhail Khludnev <mk...@apache.org>.
Opened reproducer https://github.com/apache/lucene/pull/12157

On Mon, Feb 13, 2023 at 6:46 PM Mikhail Khludnev <mk...@apache.org> wrote:

> It's time to summon Lucene devs
> https://issues.apache.org/jira/browse/SOLR-16652?focusedCommentId=17687998&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17687998
>
> it seems by design
> https://github.com/apache/lucene/blob/main/lucene/queryparser/src/test/org/apache/lucene/queryparser/classic/TestQueryParser.java#L591
> It sets mw synonym: "guinea pig => cavy"
> dumb.parse("guinea pig") => ((+field:guinea +field:pig) field:cavy)
> Doesn't match just 'guinea' as expected in this ticket.
>
> On Mon, Feb 13, 2023 at 5:33 PM Rudi Seitz <ru...@rudiseitz.com> wrote:
>
>> Thanks Mikhail.
>> I think your directional approach ("foo bar=>baz,foo,bar") would work, but
>> we'd also need "baz=>baz,foo bar" for a complete workaround.
>> I've added your message as a comment on the ticket.
>> Rudi
>>
>> On Sat, Feb 11, 2023 at 12:34 PM Mikhail Khludnev <mk...@apache.org>
>> wrote:
>>
>> > Thanks for raising a ticket. Here are just two considerations:
>> > > we could change the synonym rule to "foo bar,baz,foo,bar" but this
>> would
>> > mean that a query for "foo" could now match a document containing only
>> > "bar", which is not the intent of the original rule.
>> > Ok. The later issue can be probably fixed by directing synonyms
>> > foo bar=>baz,foo,bar
>> > Right, It seems like a weird band aid.
>> >
>> > I stepped through lucene code, MUST occur for synonyms is defined
>> >
>> >
>> https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L534
>> > Presumably, original terms could go with defaultOperator, and synonym
>> > replacement keep MUST.
>> >
>> >
>> >
>> >
>> >
>> > On Sat, Feb 11, 2023 at 12:17 AM Rudi Seitz <ru...@rudiseitz.com> wrote:
>> >
>> > > Thanks Mikhail and Michael.
>> > > Based on your feedback, I created a ticket:
>> > > https://issues.apache.org/jira/browse/SOLR-16652
>> > > In the ticket, I mentioned why updating the synonym rule or setting
>> > > sow=true causes other problems in this case, unfortunately. I haven't
>> yet
>> > > looked through code to see where the behavior could be changed.
>> > > Rudi
>> > >
>> > >
>> > > On Fri, Feb 10, 2023 at 11:26 AM Michael Gibney <
>> > michael@michaelgibney.net
>> > > >
>> > > wrote:
>> > >
>> > > > Rudi,
>> > > >
>> > > > I agree, this does not seem like how it should behave. Probably
>> > > > something that could be fixed in edismax, not something lower-level
>> > > > (Lucene)?
>> > > >
>> > > > Michael
>> > > >
>> > > > On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev <mk...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > Hello, Rudi.
>> > > > > Well, it doesn't seem perfect. Probably it's can be fixed
>> > > > > via
>> > > > > foo bar,zzz,foo,bar
>> > > > > And in some sort of sense this behavior is reasonable.
>> > > > > Also you can experiment with sow and pf params (the later param is
>> > > > > described in dismax page only).
>> > > > >
>> > > > > On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz <ru...@rudiseitz.com>
>> > wrote:
>> > > > >
>> > > > > > Is this known behavior or is it worth a JIRA ticket?
>> > > > > >
>> > > > > > Searching against a text_general field in Solr 9.1, if my
>> edismax
>> > > > query is
>> > > > > > "foo bar" I should be able to get matches for "foo" without
>> "bar"
>> > and
>> > > > vice
>> > > > > > versa. However, if there happens to be a synonym rule applied at
>> > > query
>> > > > > > time, like "foo bar,zzz" I can no longer get single-term matches
>> > > > against
>> > > > > > "foo" or "bar." Both terms are now required, but can occur in
>> > either
>> > > > order.
>> > > > > > If we change the text_general analysis chain to apply synonyms
>> at
>> > > index
>> > > > > > time instead of query time, this behavior goes away and
>> single-term
>> > > > matches
>> > > > > > are again possible.
>> > > > > >
>> > > > > > To reproduce, use the _default configset with "foo bar,zzz"
>> added
>> > to
>> > > > > > synonyms.txt. Index these four docs:
>> > > > > >
>> > > > > > {"id":"1", "title_txt":"foo"}
>> > > > > > {"id":"2", "title_txt":"bar"}
>> > > > > > {"id":"3", "title_txt":"foo bar"}
>> > > > > > {"id":"4", "title_txt":"bar foo"}
>> > > > > >
>> > > > > > Issue a query for "foo bar" (i.e.
>> > > > > > defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
>> > > > > > Result: Only docs 3 and 4 come back
>> > > > > >
>> > > > > > Issue a query for "bar foo"
>> > > > > > Result: All four docs come back; the synonym rule is not invoked
>> > > > > >
>> > > > > > Looking at the explain output for "foo bar" we see:
>> > > > > >
>> > > > > > +((title_txt:zzz (+title_txt:foo +title_txt:bar)))
>> > > > > >
>> > > > > >
>> > > > > > Looking at the explain output for "bar foo" we see:
>> > > > > >
>> > > > > > +((title_txt:bar) (title_txt:foo))
>> > > > > >
>> > > > > > So, the observed behavior makes sense according to the low-level
>> > > query
>> > > > > > structure. But -- is this how it's "supposed" to work?
>> > > > > >
>> > > > > > Why not expand the "foo bar" query like this instead?
>> > > > > >
>> > > > > > +((title_txt:zzz (title_txt:foo title_txt:bar)))
>> > > > > >
>> > > > > > Rudi
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Sincerely yours
>> > > > > Mikhail Khludnev
>> > > > > https://t.me/MUST_SEARCH
>> > > > > A caveat: Cyrillic!
>> > > >
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> > https://t.me/MUST_SEARCH
>> > A caveat: Cyrillic!
>> >
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: multi-term synonym prevents single-term match -- known issue?

Posted by Mikhail Khludnev <mk...@apache.org>.
It's time to summon Lucene devs
https://issues.apache.org/jira/browse/SOLR-16652?focusedCommentId=17687998&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17687998

it seems by design
https://github.com/apache/lucene/blob/main/lucene/queryparser/src/test/org/apache/lucene/queryparser/classic/TestQueryParser.java#L591
It sets mw synonym: "guinea pig => cavy"
dumb.parse("guinea pig") => ((+field:guinea +field:pig) field:cavy)
Doesn't match just 'guinea' as expected in this ticket.

On Mon, Feb 13, 2023 at 5:33 PM Rudi Seitz <ru...@rudiseitz.com> wrote:

> Thanks Mikhail.
> I think your directional approach ("foo bar=>baz,foo,bar") would work, but
> we'd also need "baz=>baz,foo bar" for a complete workaround.
> I've added your message as a comment on the ticket.
> Rudi
>
> On Sat, Feb 11, 2023 at 12:34 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
> > Thanks for raising a ticket. Here are just two considerations:
> > > we could change the synonym rule to "foo bar,baz,foo,bar" but this
> would
> > mean that a query for "foo" could now match a document containing only
> > "bar", which is not the intent of the original rule.
> > Ok. The later issue can be probably fixed by directing synonyms
> > foo bar=>baz,foo,bar
> > Right, It seems like a weird band aid.
> >
> > I stepped through lucene code, MUST occur for synonyms is defined
> >
> >
> https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L534
> > Presumably, original terms could go with defaultOperator, and synonym
> > replacement keep MUST.
> >
> >
> >
> >
> >
> > On Sat, Feb 11, 2023 at 12:17 AM Rudi Seitz <ru...@rudiseitz.com> wrote:
> >
> > > Thanks Mikhail and Michael.
> > > Based on your feedback, I created a ticket:
> > > https://issues.apache.org/jira/browse/SOLR-16652
> > > In the ticket, I mentioned why updating the synonym rule or setting
> > > sow=true causes other problems in this case, unfortunately. I haven't
> yet
> > > looked through code to see where the behavior could be changed.
> > > Rudi
> > >
> > >
> > > On Fri, Feb 10, 2023 at 11:26 AM Michael Gibney <
> > michael@michaelgibney.net
> > > >
> > > wrote:
> > >
> > > > Rudi,
> > > >
> > > > I agree, this does not seem like how it should behave. Probably
> > > > something that could be fixed in edismax, not something lower-level
> > > > (Lucene)?
> > > >
> > > > Michael
> > > >
> > > > On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev <mk...@apache.org>
> > > wrote:
> > > > >
> > > > > Hello, Rudi.
> > > > > Well, it doesn't seem perfect. Probably it's can be fixed
> > > > > via
> > > > > foo bar,zzz,foo,bar
> > > > > And in some sort of sense this behavior is reasonable.
> > > > > Also you can experiment with sow and pf params (the later param is
> > > > > described in dismax page only).
> > > > >
> > > > > On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz <ru...@rudiseitz.com>
> > wrote:
> > > > >
> > > > > > Is this known behavior or is it worth a JIRA ticket?
> > > > > >
> > > > > > Searching against a text_general field in Solr 9.1, if my edismax
> > > > query is
> > > > > > "foo bar" I should be able to get matches for "foo" without "bar"
> > and
> > > > vice
> > > > > > versa. However, if there happens to be a synonym rule applied at
> > > query
> > > > > > time, like "foo bar,zzz" I can no longer get single-term matches
> > > > against
> > > > > > "foo" or "bar." Both terms are now required, but can occur in
> > either
> > > > order.
> > > > > > If we change the text_general analysis chain to apply synonyms at
> > > index
> > > > > > time instead of query time, this behavior goes away and
> single-term
> > > > matches
> > > > > > are again possible.
> > > > > >
> > > > > > To reproduce, use the _default configset with "foo bar,zzz" added
> > to
> > > > > > synonyms.txt. Index these four docs:
> > > > > >
> > > > > > {"id":"1", "title_txt":"foo"}
> > > > > > {"id":"2", "title_txt":"bar"}
> > > > > > {"id":"3", "title_txt":"foo bar"}
> > > > > > {"id":"4", "title_txt":"bar foo"}
> > > > > >
> > > > > > Issue a query for "foo bar" (i.e.
> > > > > > defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
> > > > > > Result: Only docs 3 and 4 come back
> > > > > >
> > > > > > Issue a query for "bar foo"
> > > > > > Result: All four docs come back; the synonym rule is not invoked
> > > > > >
> > > > > > Looking at the explain output for "foo bar" we see:
> > > > > >
> > > > > > +((title_txt:zzz (+title_txt:foo +title_txt:bar)))
> > > > > >
> > > > > >
> > > > > > Looking at the explain output for "bar foo" we see:
> > > > > >
> > > > > > +((title_txt:bar) (title_txt:foo))
> > > > > >
> > > > > > So, the observed behavior makes sense according to the low-level
> > > query
> > > > > > structure. But -- is this how it's "supposed" to work?
> > > > > >
> > > > > > Why not expand the "foo bar" query like this instead?
> > > > > >
> > > > > > +((title_txt:zzz (title_txt:foo title_txt:bar)))
> > > > > >
> > > > > > Rudi
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours
> > > > > Mikhail Khludnev
> > > > > https://t.me/MUST_SEARCH
> > > > > A caveat: Cyrillic!
> > > >
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/MUST_SEARCH
> > A caveat: Cyrillic!
> >
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: multi-term synonym prevents single-term match -- known issue?

Posted by Mikhail Khludnev <mk...@apache.org>.
It's time to summon Lucene devs
https://issues.apache.org/jira/browse/SOLR-16652?focusedCommentId=17687998&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17687998

it seems by design
https://github.com/apache/lucene/blob/main/lucene/queryparser/src/test/org/apache/lucene/queryparser/classic/TestQueryParser.java#L591
It sets mw synonym: "guinea pig => cavy"
dumb.parse("guinea pig") => ((+field:guinea +field:pig) field:cavy)
Doesn't match just 'guinea' as expected in this ticket.

On Mon, Feb 13, 2023 at 5:33 PM Rudi Seitz <ru...@rudiseitz.com> wrote:

> Thanks Mikhail.
> I think your directional approach ("foo bar=>baz,foo,bar") would work, but
> we'd also need "baz=>baz,foo bar" for a complete workaround.
> I've added your message as a comment on the ticket.
> Rudi
>
> On Sat, Feb 11, 2023 at 12:34 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
> > Thanks for raising a ticket. Here are just two considerations:
> > > we could change the synonym rule to "foo bar,baz,foo,bar" but this
> would
> > mean that a query for "foo" could now match a document containing only
> > "bar", which is not the intent of the original rule.
> > Ok. The later issue can be probably fixed by directing synonyms
> > foo bar=>baz,foo,bar
> > Right, It seems like a weird band aid.
> >
> > I stepped through lucene code, MUST occur for synonyms is defined
> >
> >
> https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L534
> > Presumably, original terms could go with defaultOperator, and synonym
> > replacement keep MUST.
> >
> >
> >
> >
> >
> > On Sat, Feb 11, 2023 at 12:17 AM Rudi Seitz <ru...@rudiseitz.com> wrote:
> >
> > > Thanks Mikhail and Michael.
> > > Based on your feedback, I created a ticket:
> > > https://issues.apache.org/jira/browse/SOLR-16652
> > > In the ticket, I mentioned why updating the synonym rule or setting
> > > sow=true causes other problems in this case, unfortunately. I haven't
> yet
> > > looked through code to see where the behavior could be changed.
> > > Rudi
> > >
> > >
> > > On Fri, Feb 10, 2023 at 11:26 AM Michael Gibney <
> > michael@michaelgibney.net
> > > >
> > > wrote:
> > >
> > > > Rudi,
> > > >
> > > > I agree, this does not seem like how it should behave. Probably
> > > > something that could be fixed in edismax, not something lower-level
> > > > (Lucene)?
> > > >
> > > > Michael
> > > >
> > > > On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev <mk...@apache.org>
> > > wrote:
> > > > >
> > > > > Hello, Rudi.
> > > > > Well, it doesn't seem perfect. Probably it's can be fixed
> > > > > via
> > > > > foo bar,zzz,foo,bar
> > > > > And in some sort of sense this behavior is reasonable.
> > > > > Also you can experiment with sow and pf params (the later param is
> > > > > described in dismax page only).
> > > > >
> > > > > On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz <ru...@rudiseitz.com>
> > wrote:
> > > > >
> > > > > > Is this known behavior or is it worth a JIRA ticket?
> > > > > >
> > > > > > Searching against a text_general field in Solr 9.1, if my edismax
> > > > query is
> > > > > > "foo bar" I should be able to get matches for "foo" without "bar"
> > and
> > > > vice
> > > > > > versa. However, if there happens to be a synonym rule applied at
> > > query
> > > > > > time, like "foo bar,zzz" I can no longer get single-term matches
> > > > against
> > > > > > "foo" or "bar." Both terms are now required, but can occur in
> > either
> > > > order.
> > > > > > If we change the text_general analysis chain to apply synonyms at
> > > index
> > > > > > time instead of query time, this behavior goes away and
> single-term
> > > > matches
> > > > > > are again possible.
> > > > > >
> > > > > > To reproduce, use the _default configset with "foo bar,zzz" added
> > to
> > > > > > synonyms.txt. Index these four docs:
> > > > > >
> > > > > > {"id":"1", "title_txt":"foo"}
> > > > > > {"id":"2", "title_txt":"bar"}
> > > > > > {"id":"3", "title_txt":"foo bar"}
> > > > > > {"id":"4", "title_txt":"bar foo"}
> > > > > >
> > > > > > Issue a query for "foo bar" (i.e.
> > > > > > defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
> > > > > > Result: Only docs 3 and 4 come back
> > > > > >
> > > > > > Issue a query for "bar foo"
> > > > > > Result: All four docs come back; the synonym rule is not invoked
> > > > > >
> > > > > > Looking at the explain output for "foo bar" we see:
> > > > > >
> > > > > > +((title_txt:zzz (+title_txt:foo +title_txt:bar)))
> > > > > >
> > > > > >
> > > > > > Looking at the explain output for "bar foo" we see:
> > > > > >
> > > > > > +((title_txt:bar) (title_txt:foo))
> > > > > >
> > > > > > So, the observed behavior makes sense according to the low-level
> > > query
> > > > > > structure. But -- is this how it's "supposed" to work?
> > > > > >
> > > > > > Why not expand the "foo bar" query like this instead?
> > > > > >
> > > > > > +((title_txt:zzz (title_txt:foo title_txt:bar)))
> > > > > >
> > > > > > Rudi
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours
> > > > > Mikhail Khludnev
> > > > > https://t.me/MUST_SEARCH
> > > > > A caveat: Cyrillic!
> > > >
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/MUST_SEARCH
> > A caveat: Cyrillic!
> >
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: multi-term synonym prevents single-term match -- known issue?

Posted by Rudi Seitz <ru...@rudiseitz.com>.
Thanks Mikhail.
I think your directional approach ("foo bar=>baz,foo,bar") would work, but
we'd also need "baz=>baz,foo bar" for a complete workaround.
I've added your message as a comment on the ticket.
Rudi

On Sat, Feb 11, 2023 at 12:34 PM Mikhail Khludnev <mk...@apache.org> wrote:

> Thanks for raising a ticket. Here are just two considerations:
> > we could change the synonym rule to "foo bar,baz,foo,bar" but this would
> mean that a query for "foo" could now match a document containing only
> "bar", which is not the intent of the original rule.
> Ok. The later issue can be probably fixed by directing synonyms
> foo bar=>baz,foo,bar
> Right, It seems like a weird band aid.
>
> I stepped through lucene code, MUST occur for synonyms is defined
>
> https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L534
> Presumably, original terms could go with defaultOperator, and synonym
> replacement keep MUST.
>
>
>
>
>
> On Sat, Feb 11, 2023 at 12:17 AM Rudi Seitz <ru...@rudiseitz.com> wrote:
>
> > Thanks Mikhail and Michael.
> > Based on your feedback, I created a ticket:
> > https://issues.apache.org/jira/browse/SOLR-16652
> > In the ticket, I mentioned why updating the synonym rule or setting
> > sow=true causes other problems in this case, unfortunately. I haven't yet
> > looked through code to see where the behavior could be changed.
> > Rudi
> >
> >
> > On Fri, Feb 10, 2023 at 11:26 AM Michael Gibney <
> michael@michaelgibney.net
> > >
> > wrote:
> >
> > > Rudi,
> > >
> > > I agree, this does not seem like how it should behave. Probably
> > > something that could be fixed in edismax, not something lower-level
> > > (Lucene)?
> > >
> > > Michael
> > >
> > > On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev <mk...@apache.org>
> > wrote:
> > > >
> > > > Hello, Rudi.
> > > > Well, it doesn't seem perfect. Probably it's can be fixed
> > > > via
> > > > foo bar,zzz,foo,bar
> > > > And in some sort of sense this behavior is reasonable.
> > > > Also you can experiment with sow and pf params (the later param is
> > > > described in dismax page only).
> > > >
> > > > On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz <ru...@rudiseitz.com>
> wrote:
> > > >
> > > > > Is this known behavior or is it worth a JIRA ticket?
> > > > >
> > > > > Searching against a text_general field in Solr 9.1, if my edismax
> > > query is
> > > > > "foo bar" I should be able to get matches for "foo" without "bar"
> and
> > > vice
> > > > > versa. However, if there happens to be a synonym rule applied at
> > query
> > > > > time, like "foo bar,zzz" I can no longer get single-term matches
> > > against
> > > > > "foo" or "bar." Both terms are now required, but can occur in
> either
> > > order.
> > > > > If we change the text_general analysis chain to apply synonyms at
> > index
> > > > > time instead of query time, this behavior goes away and single-term
> > > matches
> > > > > are again possible.
> > > > >
> > > > > To reproduce, use the _default configset with "foo bar,zzz" added
> to
> > > > > synonyms.txt. Index these four docs:
> > > > >
> > > > > {"id":"1", "title_txt":"foo"}
> > > > > {"id":"2", "title_txt":"bar"}
> > > > > {"id":"3", "title_txt":"foo bar"}
> > > > > {"id":"4", "title_txt":"bar foo"}
> > > > >
> > > > > Issue a query for "foo bar" (i.e.
> > > > > defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
> > > > > Result: Only docs 3 and 4 come back
> > > > >
> > > > > Issue a query for "bar foo"
> > > > > Result: All four docs come back; the synonym rule is not invoked
> > > > >
> > > > > Looking at the explain output for "foo bar" we see:
> > > > >
> > > > > +((title_txt:zzz (+title_txt:foo +title_txt:bar)))
> > > > >
> > > > >
> > > > > Looking at the explain output for "bar foo" we see:
> > > > >
> > > > > +((title_txt:bar) (title_txt:foo))
> > > > >
> > > > > So, the observed behavior makes sense according to the low-level
> > query
> > > > > structure. But -- is this how it's "supposed" to work?
> > > > >
> > > > > Why not expand the "foo bar" query like this instead?
> > > > >
> > > > > +((title_txt:zzz (title_txt:foo title_txt:bar)))
> > > > >
> > > > > Rudi
> > > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > > https://t.me/MUST_SEARCH
> > > > A caveat: Cyrillic!
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>

Re: multi-term synonym prevents single-term match -- known issue?

Posted by Mikhail Khludnev <mk...@apache.org>.
Thanks for raising a ticket. Here are just two considerations:
> we could change the synonym rule to "foo bar,baz,foo,bar" but this would
mean that a query for "foo" could now match a document containing only
"bar", which is not the intent of the original rule.
Ok. The later issue can be probably fixed by directing synonyms
foo bar=>baz,foo,bar
Right, It seems like a weird band aid.

I stepped through lucene code, MUST occur for synonyms is defined
https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L534
Presumably, original terms could go with defaultOperator, and synonym
replacement keep MUST.





On Sat, Feb 11, 2023 at 12:17 AM Rudi Seitz <ru...@rudiseitz.com> wrote:

> Thanks Mikhail and Michael.
> Based on your feedback, I created a ticket:
> https://issues.apache.org/jira/browse/SOLR-16652
> In the ticket, I mentioned why updating the synonym rule or setting
> sow=true causes other problems in this case, unfortunately. I haven't yet
> looked through code to see where the behavior could be changed.
> Rudi
>
>
> On Fri, Feb 10, 2023 at 11:26 AM Michael Gibney <michael@michaelgibney.net
> >
> wrote:
>
> > Rudi,
> >
> > I agree, this does not seem like how it should behave. Probably
> > something that could be fixed in edismax, not something lower-level
> > (Lucene)?
> >
> > Michael
> >
> > On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev <mk...@apache.org>
> wrote:
> > >
> > > Hello, Rudi.
> > > Well, it doesn't seem perfect. Probably it's can be fixed
> > > via
> > > foo bar,zzz,foo,bar
> > > And in some sort of sense this behavior is reasonable.
> > > Also you can experiment with sow and pf params (the later param is
> > > described in dismax page only).
> > >
> > > On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz <ru...@rudiseitz.com> wrote:
> > >
> > > > Is this known behavior or is it worth a JIRA ticket?
> > > >
> > > > Searching against a text_general field in Solr 9.1, if my edismax
> > query is
> > > > "foo bar" I should be able to get matches for "foo" without "bar" and
> > vice
> > > > versa. However, if there happens to be a synonym rule applied at
> query
> > > > time, like "foo bar,zzz" I can no longer get single-term matches
> > against
> > > > "foo" or "bar." Both terms are now required, but can occur in either
> > order.
> > > > If we change the text_general analysis chain to apply synonyms at
> index
> > > > time instead of query time, this behavior goes away and single-term
> > matches
> > > > are again possible.
> > > >
> > > > To reproduce, use the _default configset with "foo bar,zzz" added to
> > > > synonyms.txt. Index these four docs:
> > > >
> > > > {"id":"1", "title_txt":"foo"}
> > > > {"id":"2", "title_txt":"bar"}
> > > > {"id":"3", "title_txt":"foo bar"}
> > > > {"id":"4", "title_txt":"bar foo"}
> > > >
> > > > Issue a query for "foo bar" (i.e.
> > > > defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
> > > > Result: Only docs 3 and 4 come back
> > > >
> > > > Issue a query for "bar foo"
> > > > Result: All four docs come back; the synonym rule is not invoked
> > > >
> > > > Looking at the explain output for "foo bar" we see:
> > > >
> > > > +((title_txt:zzz (+title_txt:foo +title_txt:bar)))
> > > >
> > > >
> > > > Looking at the explain output for "bar foo" we see:
> > > >
> > > > +((title_txt:bar) (title_txt:foo))
> > > >
> > > > So, the observed behavior makes sense according to the low-level
> query
> > > > structure. But -- is this how it's "supposed" to work?
> > > >
> > > > Why not expand the "foo bar" query like this instead?
> > > >
> > > > +((title_txt:zzz (title_txt:foo title_txt:bar)))
> > > >
> > > > Rudi
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > https://t.me/MUST_SEARCH
> > > A caveat: Cyrillic!
> >
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: multi-term synonym prevents single-term match -- known issue?

Posted by Rudi Seitz <ru...@rudiseitz.com>.
Thanks Mikhail and Michael.
Based on your feedback, I created a ticket:
https://issues.apache.org/jira/browse/SOLR-16652
In the ticket, I mentioned why updating the synonym rule or setting
sow=true causes other problems in this case, unfortunately. I haven't yet
looked through code to see where the behavior could be changed.
Rudi


On Fri, Feb 10, 2023 at 11:26 AM Michael Gibney <mi...@michaelgibney.net>
wrote:

> Rudi,
>
> I agree, this does not seem like how it should behave. Probably
> something that could be fixed in edismax, not something lower-level
> (Lucene)?
>
> Michael
>
> On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev <mk...@apache.org> wrote:
> >
> > Hello, Rudi.
> > Well, it doesn't seem perfect. Probably it's can be fixed
> > via
> > foo bar,zzz,foo,bar
> > And in some sort of sense this behavior is reasonable.
> > Also you can experiment with sow and pf params (the later param is
> > described in dismax page only).
> >
> > On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz <ru...@rudiseitz.com> wrote:
> >
> > > Is this known behavior or is it worth a JIRA ticket?
> > >
> > > Searching against a text_general field in Solr 9.1, if my edismax
> query is
> > > "foo bar" I should be able to get matches for "foo" without "bar" and
> vice
> > > versa. However, if there happens to be a synonym rule applied at query
> > > time, like "foo bar,zzz" I can no longer get single-term matches
> against
> > > "foo" or "bar." Both terms are now required, but can occur in either
> order.
> > > If we change the text_general analysis chain to apply synonyms at index
> > > time instead of query time, this behavior goes away and single-term
> matches
> > > are again possible.
> > >
> > > To reproduce, use the _default configset with "foo bar,zzz" added to
> > > synonyms.txt. Index these four docs:
> > >
> > > {"id":"1", "title_txt":"foo"}
> > > {"id":"2", "title_txt":"bar"}
> > > {"id":"3", "title_txt":"foo bar"}
> > > {"id":"4", "title_txt":"bar foo"}
> > >
> > > Issue a query for "foo bar" (i.e.
> > > defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
> > > Result: Only docs 3 and 4 come back
> > >
> > > Issue a query for "bar foo"
> > > Result: All four docs come back; the synonym rule is not invoked
> > >
> > > Looking at the explain output for "foo bar" we see:
> > >
> > > +((title_txt:zzz (+title_txt:foo +title_txt:bar)))
> > >
> > >
> > > Looking at the explain output for "bar foo" we see:
> > >
> > > +((title_txt:bar) (title_txt:foo))
> > >
> > > So, the observed behavior makes sense according to the low-level query
> > > structure. But -- is this how it's "supposed" to work?
> > >
> > > Why not expand the "foo bar" query like this instead?
> > >
> > > +((title_txt:zzz (title_txt:foo title_txt:bar)))
> > >
> > > Rudi
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/MUST_SEARCH
> > A caveat: Cyrillic!
>

Re: multi-term synonym prevents single-term match -- known issue?

Posted by Michael Gibney <mi...@michaelgibney.net>.
Rudi,

I agree, this does not seem like how it should behave. Probably
something that could be fixed in edismax, not something lower-level
(Lucene)?

Michael

On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev <mk...@apache.org> wrote:
>
> Hello, Rudi.
> Well, it doesn't seem perfect. Probably it's can be fixed
> via
> foo bar,zzz,foo,bar
> And in some sort of sense this behavior is reasonable.
> Also you can experiment with sow and pf params (the later param is
> described in dismax page only).
>
> On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz <ru...@rudiseitz.com> wrote:
>
> > Is this known behavior or is it worth a JIRA ticket?
> >
> > Searching against a text_general field in Solr 9.1, if my edismax query is
> > "foo bar" I should be able to get matches for "foo" without "bar" and vice
> > versa. However, if there happens to be a synonym rule applied at query
> > time, like "foo bar,zzz" I can no longer get single-term matches against
> > "foo" or "bar." Both terms are now required, but can occur in either order.
> > If we change the text_general analysis chain to apply synonyms at index
> > time instead of query time, this behavior goes away and single-term matches
> > are again possible.
> >
> > To reproduce, use the _default configset with "foo bar,zzz" added to
> > synonyms.txt. Index these four docs:
> >
> > {"id":"1", "title_txt":"foo"}
> > {"id":"2", "title_txt":"bar"}
> > {"id":"3", "title_txt":"foo bar"}
> > {"id":"4", "title_txt":"bar foo"}
> >
> > Issue a query for "foo bar" (i.e.
> > defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
> > Result: Only docs 3 and 4 come back
> >
> > Issue a query for "bar foo"
> > Result: All four docs come back; the synonym rule is not invoked
> >
> > Looking at the explain output for "foo bar" we see:
> >
> > +((title_txt:zzz (+title_txt:foo +title_txt:bar)))
> >
> >
> > Looking at the explain output for "bar foo" we see:
> >
> > +((title_txt:bar) (title_txt:foo))
> >
> > So, the observed behavior makes sense according to the low-level query
> > structure. But -- is this how it's "supposed" to work?
> >
> > Why not expand the "foo bar" query like this instead?
> >
> > +((title_txt:zzz (title_txt:foo title_txt:bar)))
> >
> > Rudi
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!

Re: multi-term synonym prevents single-term match -- known issue?

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Rudi.
Well, it doesn't seem perfect. Probably it's can be fixed
via
foo bar,zzz,foo,bar
And in some sort of sense this behavior is reasonable.
Also you can experiment with sow and pf params (the later param is
described in dismax page only).

On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz <ru...@rudiseitz.com> wrote:

> Is this known behavior or is it worth a JIRA ticket?
>
> Searching against a text_general field in Solr 9.1, if my edismax query is
> "foo bar" I should be able to get matches for "foo" without "bar" and vice
> versa. However, if there happens to be a synonym rule applied at query
> time, like "foo bar,zzz" I can no longer get single-term matches against
> "foo" or "bar." Both terms are now required, but can occur in either order.
> If we change the text_general analysis chain to apply synonyms at index
> time instead of query time, this behavior goes away and single-term matches
> are again possible.
>
> To reproduce, use the _default configset with "foo bar,zzz" added to
> synonyms.txt. Index these four docs:
>
> {"id":"1", "title_txt":"foo"}
> {"id":"2", "title_txt":"bar"}
> {"id":"3", "title_txt":"foo bar"}
> {"id":"4", "title_txt":"bar foo"}
>
> Issue a query for "foo bar" (i.e.
> defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
> Result: Only docs 3 and 4 come back
>
> Issue a query for "bar foo"
> Result: All four docs come back; the synonym rule is not invoked
>
> Looking at the explain output for "foo bar" we see:
>
> +((title_txt:zzz (+title_txt:foo +title_txt:bar)))
>
>
> Looking at the explain output for "bar foo" we see:
>
> +((title_txt:bar) (title_txt:foo))
>
> So, the observed behavior makes sense according to the low-level query
> structure. But -- is this how it's "supposed" to work?
>
> Why not expand the "foo bar" query like this instead?
>
> +((title_txt:zzz (title_txt:foo title_txt:bar)))
>
> Rudi
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!