You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2014/03/16 05:47:45 UTC

[jira] [Updated] (LUCENE-4382) Unicode escape no longer works for non-suffix-only wildcard terms

     [ https://issues.apache.org/jira/browse/LUCENE-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated LUCENE-4382:
---------------------------------

    Fix Version/s:     (was: 4.7)
                   4.8

> Unicode escape no longer works for non-suffix-only wildcard terms
> -----------------------------------------------------------------
>
>                 Key: LUCENE-4382
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4382
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 4.0-BETA
>            Reporter: Jack Krupansky
>             Fix For: 4.8
>
>
> LUCENE-588 added support for escaping of wildcard characters, but when the de-escaping logic was pushed down from the query parser (QueryParserBase) into WildcardQuery, support for Unicode escaping (backslash, "u", and the four-digit hex Unicode code) was not included.
> Two solutions:
> 1. Do the Unicode de-escaping in the query parser before calling getWildcardQuery.
> 2. Support Unicode de-escaping in WildcardQuery.
> A suffix-only wildcard does not exhibit this problem because full de-escaping is performed in the query parser before calling getPrefixQuery.
> My test case, added at the beginning of TestExtendedDismaxParser.testFocusQueryParser:
> {code}
>     assertQ("expected doc is missing (using escaped edismax w/field)",
>         req("q", "t_special:literal\\:\\u0063olo*n", 
>             "defType", "edismax"),
>         "//doc[1]/str[@name='id'][.='46']"); 
> {code}
> Note: That test case was only used to debug into WildcardQuery to see that the Unicode escape was not processed correctly. It fails in all cases, but that's because of how the field type is analyzed.
> Here is a Lucene-level test case that can also be debugged to see that WildcardQuery is not processing the Unicode escape properly. I added it at the start of TestMultiAnalyzer.testMultiAnalyzer:
> {code}
>     assertEquals("literal\\:\\u0063olo*n", qp.parse("literal\\:\\u0063olo*n").toString());
> {code}
> Note: This case will always run correctly since it is only checking the input pattern string for WildcardQuery and not how the de-escaping was performed within WildcardQuery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org