You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael Busch (JIRA)" <ji...@apache.org> on 2009/01/27 02:14:59 UTC

[jira] Commented: (LUCENE-1528) Add support for Ideographic Space to the queryparser - also know as fullwith space and wide-space

    [ https://issues.apache.org/jira/browse/LUCENE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667538#action_12667538 ] 

Michael Busch commented on LUCENE-1528:
---------------------------------------

Looks good, Luis!

I was just wondering if you can do something like the following to avoid defining the whitespace chars in two places:
{noformat}
| <#_WHITESPACE: ( " " | "\t" | "\n" | "\r") >
| <#_TERM_START_CHAR: ( ~( <_WHITESPACE> | [ "+", "-", "!", "(", ")", ":", "^",
                           "[", "]", "\"", "{", "}", "~", "*", "?", "\\" ])
                       | <_ESCAPED_CHAR> ) >
{noformat}

This does not compile... is there another way to achieve this in javacc?
If not, it's not a big deal and I can commit this patch as is.

> Add support for Ideographic Space to the queryparser - also know as fullwith space and wide-space
> -------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1528
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1528
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4.1
>            Reporter: Luis Alves
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 2.4.1
>
>         Attachments: lucene_wide_space_v1_src.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The Ideographic Space is a space character that is as wide as a normal CJK character cell.
> It is also known as wide-space or fullwith space.This type of space is used in CJK languages.
> This patch adds support for the wide space, making the queryparser component more friendly
> to queries that contain CJK text.
> Reference:
> 'http://en.wikipedia.org/wiki/Space_(punctuation)' - see Table of spaces, char U+3000.
> I also added a new testcase that fails before the patch.
> After the patch is applied all junits pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org