You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2016/05/11 03:09:13 UTC

[jira] [Updated] (LUCENE-2605) queryparser parses on whitespace

     [ https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Rowe updated LUCENE-2605:
-------------------------------
    Attachment: LUCENE-2605.patch

Initial patch against the classic Lucene QueryParser grammar only.  Runs of whitespace-separated text with no operators are now sent as a single input to the analyzer, rather than first splitting on whitespace. 

There are 10 failing tests in the queryparser module (didn't try anywhere else yet), mostly caused by the way getFieldQuery() produces a nested query when there are multiple terms.  I'll look into how to address that.


> queryparser parses on whitespace
> --------------------------------
>
>                 Key: LUCENE-2605
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2605
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>            Reporter: Robert Muir
>            Assignee: Steve Rowe
>             Fix For: 4.9, 6.0
>
>         Attachments: LUCENE-2605.patch
>
>
> The queryparser parses input on whitespace, and sends each whitespace separated term to its own independent token stream.
> This breaks the following at query-time, because they can't see across whitespace boundaries:
> * n-gram analysis
> * shingles 
> * synonyms (especially multi-word for whitespace-separated languages)
> * languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters will do the same thing at index and querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse around only real 'operators'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org