You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Ivana Pranjic <iv...@gulp.de.INVALID> on 2023/03/21 10:58:59 UTC

Generated query when searching for phrase

Hi Solr community,

We are currently trying to upgrade Solr from v8 to 9 and we have stumbled
upon an issue - the queries that we are using for search are resulting in
much more clauses being generated than before, hitting the
maxBooleanClauses limit for some simple queries (even if we increase the
limit). I'll try to describe our issue as concise as possible:
When we search for a phrase like "SAP S/4HANA" and use synonym expansion,
in the parsedQuery in Solr 8, we can see this:

parsedquery_toString":"+(spanNear([spanOr([body:sap-anwend,
body:sap-anwendungsbereich, body:sap-anwendungsbereich,
body:sap-bereich, body:sap-erfahr, body:sap-expertis ...

etc, whereas  the same search in Solr9 yields a different parsedQuery:

"parsedquery_toString":"+((body:\"sap-anwend sap business suit 4
hana\" body:\"sap-anwend sap business suit 4 sap hana\"
body:\"sap-anwend sap business suit for hana\" ...

which when we analyzed, we noticed that it created a combination for
all synonyms of term SAP and all synonyms for S/4HANA. Since only the
term SAP alone has about 300 synonyms in our synonyms.txt, combined
with synonyms for S/4HANA, the number of clauses got up to over 2000.
If there are more terms and fields that we search for, this easily
explodes into a giant parsedQuery and we get the maxBooleanClauses
error.

Looking at the documentation and code, we could not figure out why
there is a difference in Solr 9, what was exactly changed in the
implementation, and what happened to the spanNear and spanOr. The
queries that we are using in Solr8 were not having performance issues
so far.

What are we missing? Is there a way to avoid creating combinations of
synonyms when searching for phrases? It seems to not be happening when
doing a regular search for both terms SAP S/4HANA, without quotes.

One thing that we probably should do is minimize the number of
synonyms in our file, or give up on searching for multiword phrases.

I hope there is someone that can enlighten us in this matter :)

Thank you!




Herzliche Grüße / Best regards

*Ivana Pranjic*
Software Developer

*GULP Information Services GmbH*


Telefon: +49 89 500316717

E-Mail: ivana.pranjic@gulp.de


*GULP - experts united*
www.gulp.de - a Randstad company

GULP Information Services GmbH
Sitz: München, Amtsgericht München HRB 207 941
Geschäftsführer: Michel Verdoold (Vors.), Arie Blom

[image: Trustpilot Human score]
<https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4>
   [image: Trustpilot Stars]
<https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4>
   [image: Trustpilot Logo]
<https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4>

<https://www.facebook.com/GULP.Jobs> <https://twitter.com/gulp_news>
<https://www.xing.com/pages/gulp>
<https://www.linkedin.com/company/gulp-experts-united>
<https://www.instagram.com/gulp_karriere>

Re: Generated query when searching for phrase

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/21/23 04:58, Ivana Pranjic wrote:
> hitting the
> maxBooleanClauses limit for some simple queries (even if we increase the
> limit).

How are you setting maxBooleanClauses?  If you're doing it in 
solrconfig.xml, that is problematic.  maxBooleanClauses is a global 
limit in Lucene ... if you don't set it in solr.xml, then you need to 
change EVERY solrconfig.xml for every core/collection, or one of them 
could reset it back to the default value of 1024 across the board.

Thanks,
Shawn

Re: Generated query when searching for phrase

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello Ivana.
I think the change caused this is [LUCENE-9207] Don't build SpanQuery in
QueryBuilder - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/LUCENE-9207>
Also, please check the last comments in Don't build SpanQuery in
QueryBuilder [LUCENE-9207] · Issue #10247 · apache/lucene · GitHub
<https://github.com/apache/lucene/issues/10247> where I attempted to
discuss a way to reproduce old buggish nested span in fancy new intervals
queries.
So, far it's stuck, I don't know for what reason.

On Tue, Mar 21, 2023 at 1:59 PM Ivana Pranjic <iv...@gulp.de.invalid>
wrote:

> Hi Solr community,
>
> We are currently trying to upgrade Solr from v8 to 9 and we have stumbled
> upon an issue - the queries that we are using for search are resulting in
> much more clauses being generated than before, hitting the
> maxBooleanClauses limit for some simple queries (even if we increase the
> limit). I'll try to describe our issue as concise as possible:
> When we search for a phrase like "SAP S/4HANA" and use synonym expansion,
> in the parsedQuery in Solr 8, we can see this:
>
> parsedquery_toString":"+(spanNear([spanOr([body:sap-anwend,
> body:sap-anwendungsbereich, body:sap-anwendungsbereich,
> body:sap-bereich, body:sap-erfahr, body:sap-expertis ...
>
> etc, whereas  the same search in Solr9 yields a different parsedQuery:
>
> "parsedquery_toString":"+((body:\"sap-anwend sap business suit 4
> hana\" body:\"sap-anwend sap business suit 4 sap hana\"
> body:\"sap-anwend sap business suit for hana\" ...
>
> which when we analyzed, we noticed that it created a combination for
> all synonyms of term SAP and all synonyms for S/4HANA. Since only the
> term SAP alone has about 300 synonyms in our synonyms.txt, combined
> with synonyms for S/4HANA, the number of clauses got up to over 2000.
> If there are more terms and fields that we search for, this easily
> explodes into a giant parsedQuery and we get the maxBooleanClauses
> error.
>
> Looking at the documentation and code, we could not figure out why
> there is a difference in Solr 9, what was exactly changed in the
> implementation, and what happened to the spanNear and spanOr. The
> queries that we are using in Solr8 were not having performance issues
> so far.
>
> What are we missing? Is there a way to avoid creating combinations of
> synonyms when searching for phrases? It seems to not be happening when
> doing a regular search for both terms SAP S/4HANA, without quotes.
>
> One thing that we probably should do is minimize the number of
> synonyms in our file, or give up on searching for multiword phrases.
>
> I hope there is someone that can enlighten us in this matter :)
>
> Thank you!
>
>
>
>
> Herzliche Grüße / Best regards
>
> *Ivana Pranjic*
> Software Developer
>
> *GULP Information Services GmbH*
>
>
> Telefon: +49 89 500316717
>
> E-Mail: ivana.pranjic@gulp.de
>
>
> *GULP - experts united*
> www.gulp.de - a Randstad company
>
> GULP Information Services GmbH
> Sitz: München, Amtsgericht München HRB 207 941
> Geschäftsführer: Michel Verdoold (Vors.), Arie Blom
>
> [image: Trustpilot Human score]
> <
> https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4
> >
>    [image: Trustpilot Stars]
> <
> https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4
> >
>    [image: Trustpilot Logo]
> <
> https://de.trustpilot.com/review/www.gulp.de?utm_medium=Trustbox&utm_source=EmailSignature4
> >
>
> <https://www.facebook.com/GULP.Jobs> <https://twitter.com/gulp_news>
> <https://www.xing.com/pages/gulp>
> <https://www.linkedin.com/company/gulp-experts-united>
> <https://www.instagram.com/gulp_karriere>
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!