You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Thomas Karampelas <tk...@atypon.com> on 2021/03/31 08:14:32 UTC

Edismax skips first part of phrase when q contains explicit field and parenthesis

Hi,

I run solr 8.4.1 and I issue the following query on edismax parser:
*defType=edismax&q=Title:(word1 for word2) &pf=Title&q.op=AND*

The parsed query edismax comes out with is the following:
+(
     +(
            +(+Title:word1 +Title_en:word2)))
(+(Title:\"for word2\"))

Firstly, I expected the strange multiple MUST operators since I have read
they are added when using AND as a default op. Also, in the first main
clause the *for *term is missing correctly, since I have a stopword
filtering in my analysis chain.
 However, what puzzles me is the fact that pf is skipping the first word of
my query. This won't happen if I was to add spaces after the opening and
before the closing parenthesis like that *Title:( word1 for word2 )*.

I took a look at the code and found why it did this (it seems that pf
ignores the first part (*(word1*) because it ignores clauses assigned to
fields, inside
org.apache.solr.search.ExtendedDismaxQParser#addPhraseFieldQueries and the
first part has Title as its field but the others do not), but I cannot
really understand the reasoning behind it. Is this to be expected or is
this a bug?

I know that I could use the qf parameter to target the field directly, but
the above query could be extended to something like Title:(word1 for word2)
OR Abstract:(word3) which I do not know how to express it via qf. Also I
expected such syntax to work as an alternative in any case.

Thanks,
Thomas

Re: Edismax skips first part of phrase when q contains explicit field and parenthesis

Posted by Thomas Karampelas <tk...@atypon.com>.
Thanks for the answer Alessandro.

Well, I would expect it to extract the query text from the query (i.e.
extracting it from the field definition) , take the word1 for word2 and add
it add a phrase against the Title field. Essentially

 +(
      +(
             +(+Title:word1 +Title_en:word2)))
 (+(Title:\"word1 for word2\"))

As I said, going through the code it seems that only the first word is
tagged as belonging to the Title field. Then, to form the phrase query edis
max omits everything that is tagged as belonging to a field, ending up
skipping the first word . This is very puzzling and it looks buggy to me,
but I might be missing something from the big picture.

I can see your point regarding pf and lucene syntax being at odds, as pf
originated with dismax, but since it is an integral feature of the edismax
parser as well I expected it to work.

Regarding creating the query manually, we do have a custom parser at the
moment, but I was looking into migrating to edismax.

Thanks.
Thomas

On Tue, May 11, 2021 at 1:44 PM Alessandro Benedetti <a....@sease.io>
wrote:

> >
> > query could be extended to something like Title:(word1 for word2)
> > OR Abstract:(word3) which I do not know how to express it via qf
>
>
> how would you like your pf to work with this?
> What is the final query you aim to?
> Probably in your case it would be better to fully go "custom" and write
> your query instead of realying on the pf parameter.
>
> I suspect pf was born in the dismax (where just free text query is supposed
> to be in the input)
> I doubt it is compatible at all with Lucene syntax in the main query (which
> is supported by the edismax).
>
> Cheers
> --------------------------
> Alessandro Benedetti
> Apache Lucene/Solr Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Tue, 11 May 2021 at 10:28, Thomas Karampelas <tk...@atypon.com>
> wrote:
>
> > Bumping this in case someone that has any idea missed it.
> >
> > On Wed, Mar 31, 2021 at 11:14 AM Thomas Karampelas <
> tkarampelas@atypon.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I run solr 8.4.1 and I issue the following query on edismax parser:
> > > *defType=edismax&q=Title:(word1 for word2) &pf=Title&q.op=AND*
> > >
> > > The parsed query edismax comes out with is the following:
> > > +(
> > >      +(
> > >             +(+Title:word1 +Title_en:word2)))
> > > (+(Title:\"for word2\"))
> > >
> > > Firstly, I expected the strange multiple MUST operators since I have
> read
> > > they are added when using AND as a default op. Also, in the first main
> > > clause the *for *term is missing correctly, since I have a stopword
> > > filtering in my analysis chain.
> > >  However, what puzzles me is the fact that pf is skipping the first
> word
> > > of my query. This won't happen if I was to add spaces after the opening
> > and
> > > before the closing parenthesis like that *Title:( word1 for word2 )*.
> > >
> > > I took a look at the code and found why it did this (it seems that pf
> > > ignores the first part (*(word1*) because it ignores clauses assigned
> to
> > > fields, inside
> > > org.apache.solr.search.ExtendedDismaxQParser#addPhraseFieldQueries and
> > the
> > > first part has Title as its field but the others do not), but I cannot
> > > really understand the reasoning behind it. Is this to be expected or is
> > > this a bug?
> > >
> > > I know that I could use the qf parameter to target the field directly,
> > but
> > > the above query could be extended to something like Title:(word1 for
> > word2)
> > > OR Abstract:(word3) which I do not know how to express it via qf. Also
> I
> > > expected such syntax to work as an alternative in any case.
> > >
> > > Thanks,
> > > Thomas
> > >
> >
>

Re: Edismax skips first part of phrase when q contains explicit field and parenthesis

Posted by Alessandro Benedetti <a....@sease.io>.
>
> query could be extended to something like Title:(word1 for word2)
> OR Abstract:(word3) which I do not know how to express it via qf


how would you like your pf to work with this?
What is the final query you aim to?
Probably in your case it would be better to fully go "custom" and write
your query instead of realying on the pf parameter.

I suspect pf was born in the dismax (where just free text query is supposed
to be in the input)
I doubt it is compatible at all with Lucene syntax in the main query (which
is supported by the edismax).

Cheers
--------------------------
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Tue, 11 May 2021 at 10:28, Thomas Karampelas <tk...@atypon.com>
wrote:

> Bumping this in case someone that has any idea missed it.
>
> On Wed, Mar 31, 2021 at 11:14 AM Thomas Karampelas <tkarampelas@atypon.com
> >
> wrote:
>
> > Hi,
> >
> > I run solr 8.4.1 and I issue the following query on edismax parser:
> > *defType=edismax&q=Title:(word1 for word2) &pf=Title&q.op=AND*
> >
> > The parsed query edismax comes out with is the following:
> > +(
> >      +(
> >             +(+Title:word1 +Title_en:word2)))
> > (+(Title:\"for word2\"))
> >
> > Firstly, I expected the strange multiple MUST operators since I have read
> > they are added when using AND as a default op. Also, in the first main
> > clause the *for *term is missing correctly, since I have a stopword
> > filtering in my analysis chain.
> >  However, what puzzles me is the fact that pf is skipping the first word
> > of my query. This won't happen if I was to add spaces after the opening
> and
> > before the closing parenthesis like that *Title:( word1 for word2 )*.
> >
> > I took a look at the code and found why it did this (it seems that pf
> > ignores the first part (*(word1*) because it ignores clauses assigned to
> > fields, inside
> > org.apache.solr.search.ExtendedDismaxQParser#addPhraseFieldQueries and
> the
> > first part has Title as its field but the others do not), but I cannot
> > really understand the reasoning behind it. Is this to be expected or is
> > this a bug?
> >
> > I know that I could use the qf parameter to target the field directly,
> but
> > the above query could be extended to something like Title:(word1 for
> word2)
> > OR Abstract:(word3) which I do not know how to express it via qf. Also I
> > expected such syntax to work as an alternative in any case.
> >
> > Thanks,
> > Thomas
> >
>

Re: Edismax skips first part of phrase when q contains explicit field and parenthesis

Posted by Thomas Karampelas <tk...@atypon.com>.
Bumping this in case someone that has any idea missed it.

On Wed, Mar 31, 2021 at 11:14 AM Thomas Karampelas <tk...@atypon.com>
wrote:

> Hi,
>
> I run solr 8.4.1 and I issue the following query on edismax parser:
> *defType=edismax&q=Title:(word1 for word2) &pf=Title&q.op=AND*
>
> The parsed query edismax comes out with is the following:
> +(
>      +(
>             +(+Title:word1 +Title_en:word2)))
> (+(Title:\"for word2\"))
>
> Firstly, I expected the strange multiple MUST operators since I have read
> they are added when using AND as a default op. Also, in the first main
> clause the *for *term is missing correctly, since I have a stopword
> filtering in my analysis chain.
>  However, what puzzles me is the fact that pf is skipping the first word
> of my query. This won't happen if I was to add spaces after the opening and
> before the closing parenthesis like that *Title:( word1 for word2 )*.
>
> I took a look at the code and found why it did this (it seems that pf
> ignores the first part (*(word1*) because it ignores clauses assigned to
> fields, inside
> org.apache.solr.search.ExtendedDismaxQParser#addPhraseFieldQueries and the
> first part has Title as its field but the others do not), but I cannot
> really understand the reasoning behind it. Is this to be expected or is
> this a bug?
>
> I know that I could use the qf parameter to target the field directly, but
> the above query could be extended to something like Title:(word1 for word2)
> OR Abstract:(word3) which I do not know how to express it via qf. Also I
> expected such syntax to work as an alternative in any case.
>
> Thanks,
> Thomas
>