You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Imre Papuscan <ip...@cylex.ro.INVALID> on 2023/03/09 15:29:51 UTC

Solr parsed query is not consistent

Hello,


I am using edismax parser with default qf = field1 field2 Now when I'm querying with q=hello world In debugging mode its showing that its making query like



parsedquery_toString: "+((((field1:hello) | (field2:hello)) ((field1:world) | (field2:world)))~2) ()",



but for other searches like q=hello universe



parsedquery_toString: "+(+((+field1:hello +field1:universe) | (+field2:hello +field2:universe)))"



The first is our expected behavior, this is how the big majority of queries works there are only a few exceptions with the second parsed query. This second is more restrictive forcing to find both terms in one of the fields while the first finds documents where the first term is in one field and the second term in the other field. There is a difference in computing the score as well: SUM of MAX vs. MAX of SUM



There was no problem in 4.3.1 version, we experienced it first 3 years ago with 8.4.1 (solrcloud) and now again with 9.0.0



I have found two workarounds:

1. using q=hello AND universe, however q.op=AND

2. using defType=dismax

With both solutions the first parsed query is used.



Our thoughts was that it's related to HunSpellFilter where one of the terms has synonyms (the parsed query looks different, contains the synonyms as well) but triggered for some of the words but not for not all having synonyms. Another examples where with matches in fields with a custom type instead of string. But there is no general rule. Most of the fields are of the form where By playing with qf values in queries excluding certain fields fixes the query execution plan, but there are nothing special with those fields and it's not the same field for different searches.



Could someone explain what triggers using the second parsed query sometimes? Is it a bug in Solr?



Thank you,

Imre Papuscan

Re: Solr parsed query is not consistent

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/9/23 08:29, Imre Papuscan wrote:
> I am using edismax parser with default qf = field1 field2 Now when I'm querying with q=hello world In debugging mode its showing that its making query like
> 
> parsedquery_toString: "+((((field1:hello) | (field2:hello)) ((field1:world) | (field2:world)))~2) ()",
> 
> but for other searches like q=hello universe
> 
> parsedquery_toString: "+(+((+field1:hello +field1:universe) | (+field2:hello +field2:universe)))"
> 
> The first is our expected behavior, this is how the big majority of queries works there are only a few exceptions with the second parsed query. This second is more restrictive forcing to find both terms in one of the fields while the first finds documents where the first term is in one field and the second term in the other field. There is a difference in computing the score as well: SUM of MAX vs. MAX of SUM

<snip>

> Our thoughts was that it's related to HunSpellFilter where one of the terms has synonyms (the parsed query looks different, contains the synonyms as well) but triggered for some of the words but not for not all having synonyms. Another examples where with matches in fields with a custom type instead of string. But there is no general rule. Most of the fields are of the form where By playing with qf values in queries excluding certain fields fixes the query execution plan, but there are nothing special with those fields and it's not the same field for different searches.

Something you can do in the admin UI is go to the Analysis tab and try 
different inputs to see how different text is analyzed both at index 
time and query time, and it shows the actual terms at each analysis 
step, which should provide you with enough information to determine 
whether your suspicion about the spell filter and synonyms is correct.

The sow parameter might also be something to investigate.  It did not 
exist on 4.x ... Solr always acted as if sow=true, so the query input on 
TextField types was split on whitespace into separate terms before it 
was given to the query analysis chain.  In whichever 6.x version 
introduced the sow parameter, it defaulted to true, but the default 
changed to false in 7.0.

Thanks,
Shawn