You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2016/11/18 00:24:58 UTC

[jira] [Comment Edited] (LUCENE-7533) Classic query parser: autoGeneratePhraseQueries=true doesn't work when splitOnWhitespace=false

    [ https://issues.apache.org/jira/browse/LUCENE-7533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15675263#comment-15675263 ] 

Steve Rowe edited comment on LUCENE-7533 at 11/18/16 12:24 AM:
---------------------------------------------------------------

I committed the patch to disallow this combination of options.  Hopefully once we unbreak graph token streams, this can be revisited.


was (Author: steve_rowe):
I committed the patch to disallow this comination of options.  Hopefully once we unbreak graph token streams, this can be revisited.

> Classic query parser: autoGeneratePhraseQueries=true doesn't work when splitOnWhitespace=false
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7533
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7533
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 6.2, 6.3, 6.2.1
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>             Fix For: master (7.0), 6.4
>
>         Attachments: LUCENE-7533-disallow-option-combo.patch, LUCENE-7533.patch
>
>
> LUCENE-2605 introduced the classic query parser option to not split on whitespace prior to performing analysis.
> From the javadocs for QueryParser.setAutoGeneratePhraseQueries(): 
> bq.phrase queries will be automatically generated when the analyzer returns more than one term from whitespace delimited text.
> When splitOnWhitespace=false, the output from analysis can now come from multiple whitespace-separated tokens, which breaks code assumptions when autoGeneratePhraseQueries=true: for this combination of options, it's not appropriate to auto-quote multiple non-overlapping tokens produced by analysis.  E.g. simple whitespace tokenization over the query "some words" will produce the token sequence ("some", "words"), and even when autoGeneratePhraseQueries=true, we should not be creating a phrase query here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org