You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Oliver Kaleske <Ol...@ptvgroup.com> on 2016/10/04 13:34:31 UTC

Re: null Query from MultiFieldQueryParser.getFieldQuery

Hi Steve,

thanks for the fix.

I locally applied the patch on branch_6_2 (because that is closest to my current 6.2.1 dependency) and built Lucene from there.
Using the outcome in my application, the problem observed there is fixed.

Best regards,
Oliver

-----Ursprüngliche Nachricht-----
Von: Steve Rowe [mailto:sarowe@gmail.com] 
Gesendet: Freitag, 30. September 2016 21:48
An: java-user@lucene.apache.org
Cc: Oliver Kaleske <Ol...@ptvgroup.com>
Betreff: Re: null Query from MultiFieldQueryParser.getFieldQuery

Hi Oliver,

Thanks for reporting and for the analysis, this is a bug.

See <https://issues.apache.org/jira/browse/LUCENE-7472>, where I’ve put up a patch with a fix that treats all non-BooleanQuery queries opaquely (like TermQuery), and adds a test for the SynonymQuery case that fails without the patch and succeeds with it.

If you could test the patch, that would be great.

--
Steve
www.lucidworks.com

> On Sep 29, 2016, at 11:24 AM, Adrien Grand <jp...@gmail.com> wrote:
> 
> I'm not very familiar with this part of the code base so I could easily
> overlook something. Maybe you can open a JIRA and attach a minimal test
> case that reproduces the issue?
> 
> Le lun. 19 sept. 2016 à 13:48, Oliver Kaleske <Ol...@ptvgroup.com>
> a écrit :
> 
>> Hi,
>> 
>> in updating Lucene from 6.1.0 to 6.2.0 I came across the following:
>> 
>> We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom
>> type of Query, which calls getFieldQuery() on its base class (MFQP).
>> For each of its search fields, this method has a Query created by calling
>> getFieldQuery() on QueryParserBase.
>> Ultimately, we wind up in QueryBuilder's createFieldQuery() method, which
>> depending on the number of tokens (etc.) decides what type of Query to
>> return: a TermQuery, BooleanQuery, PhraseQuery, or MultiPhraseQuery.
>> 
>> Back in MFQP.getFieldQuery(), a variable maxTerms is determined depending
>> on the type of Query returned: for a TermQuery or a BooleanQuery, its value
>> will in general be nonzero, clauses are created, and a non-null Query is
>> returned.
>> However, other Query subclasses result in maxTerms=0, an empty list of
>> clauses, and finally null is returned.
>> 
>> To me, this seems like a bug, but I might as well be missing something.
>> The comment "// happens for stopwords" on the return null statement,
>> however, seems to suggest that Query types other than TermQuery and
>> BooleanQuery were not considered properly here.
>> I should point out that our custom MFQP subclass so far does some rather
>> unsophisticated tokenization before calling getFieldQuery() on each token,
>> so characters like '*' may still slip through. So perhaps with proper
>> tokenization, it is guaranteed that only TermQuery and BooleanQuery can
>> come out of the chain of getFieldQuery() calls, and not handling
>> (Multi)PhraseQuery in MFQP.getFieldQuery() can never cause trouble?
>> 
>> The code in MFQP.getFieldQuery dates back to
>> LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
>> control whether to split on whitespace prior to text analysis.  Default
>> behavior remains unchanged: split-on-whitespace=true.
>> (06 Jul 2016), when it was substantially expanded.
>> 
>> Best regards,
>> Oliver
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 


Re: null Query from MultiFieldQueryParser.getFieldQuery

Posted by Steve Rowe <sa...@gmail.com>.
Great, thanks for reporting back, Oliver!

I’ll go push the change now, including to branch_6_2, so that if we release a 6.2.2 version, it will be included there.

--
Steve
www.lucidworks.com

> On Oct 4, 2016, at 9:34 AM, Oliver Kaleske <Ol...@ptvgroup.com> wrote:
> 
> Hi Steve,
> 
> thanks for the fix.
> 
> I locally applied the patch on branch_6_2 (because that is closest to my current 6.2.1 dependency) and built Lucene from there.
> Using the outcome in my application, the problem observed there is fixed.
> 
> Best regards,
> Oliver
> 
> -----Ursprüngliche Nachricht-----
> Von: Steve Rowe [mailto:sarowe@gmail.com] 
> Gesendet: Freitag, 30. September 2016 21:48
> An: java-user@lucene.apache.org
> Cc: Oliver Kaleske <Ol...@ptvgroup.com>
> Betreff: Re: null Query from MultiFieldQueryParser.getFieldQuery
> 
> Hi Oliver,
> 
> Thanks for reporting and for the analysis, this is a bug.
> 
> See <https://issues.apache.org/jira/browse/LUCENE-7472>, where I’ve put up a patch with a fix that treats all non-BooleanQuery queries opaquely (like TermQuery), and adds a test for the SynonymQuery case that fails without the patch and succeeds with it.
> 
> If you could test the patch, that would be great.
> 
> --
> Steve
> www.lucidworks.com
> 
>> On Sep 29, 2016, at 11:24 AM, Adrien Grand <jp...@gmail.com> wrote:
>> 
>> I'm not very familiar with this part of the code base so I could easily
>> overlook something. Maybe you can open a JIRA and attach a minimal test
>> case that reproduces the issue?
>> 
>> Le lun. 19 sept. 2016 à 13:48, Oliver Kaleske <Ol...@ptvgroup.com>
>> a écrit :
>> 
>>> Hi,
>>> 
>>> in updating Lucene from 6.1.0 to 6.2.0 I came across the following:
>>> 
>>> We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom
>>> type of Query, which calls getFieldQuery() on its base class (MFQP).
>>> For each of its search fields, this method has a Query created by calling
>>> getFieldQuery() on QueryParserBase.
>>> Ultimately, we wind up in QueryBuilder's createFieldQuery() method, which
>>> depending on the number of tokens (etc.) decides what type of Query to
>>> return: a TermQuery, BooleanQuery, PhraseQuery, or MultiPhraseQuery.
>>> 
>>> Back in MFQP.getFieldQuery(), a variable maxTerms is determined depending
>>> on the type of Query returned: for a TermQuery or a BooleanQuery, its value
>>> will in general be nonzero, clauses are created, and a non-null Query is
>>> returned.
>>> However, other Query subclasses result in maxTerms=0, an empty list of
>>> clauses, and finally null is returned.
>>> 
>>> To me, this seems like a bug, but I might as well be missing something.
>>> The comment "// happens for stopwords" on the return null statement,
>>> however, seems to suggest that Query types other than TermQuery and
>>> BooleanQuery were not considered properly here.
>>> I should point out that our custom MFQP subclass so far does some rather
>>> unsophisticated tokenization before calling getFieldQuery() on each token,
>>> so characters like '*' may still slip through. So perhaps with proper
>>> tokenization, it is guaranteed that only TermQuery and BooleanQuery can
>>> come out of the chain of getFieldQuery() calls, and not handling
>>> (Multi)PhraseQuery in MFQP.getFieldQuery() can never cause trouble?
>>> 
>>> The code in MFQP.getFieldQuery dates back to
>>> LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
>>> control whether to split on whitespace prior to text analysis.  Default
>>> behavior remains unchanged: split-on-whitespace=true.
>>> (06 Jul 2016), when it was substantially expanded.
>>> 
>>> Best regards,
>>> Oliver
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org