You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2005/10/05 05:54:47 UTC

[jira] Resolved: (LUCENE-444) StandardTokenizer loses Korean characters

     [ http://issues.apache.org/jira/browse/LUCENE-444?page=all ]
     
Otis Gospodnetic resolved LUCENE-444:
-------------------------------------

    Fix Version: 1.9
     Resolution: Fixed

Committed.  Thanks Cheolgoo.

> StandardTokenizer loses Korean characters
> -----------------------------------------
>
>          Key: LUCENE-444
>          URL: http://issues.apache.org/jira/browse/LUCENE-444
>      Project: Lucene - Java
>         Type: Bug
>   Components: Analysis
>     Reporter: Cheolgoo Kang
>     Priority: Minor
>      Fix For: 1.9
>  Attachments: StandardTokenizer_Korean.patch
>
> While using StandardAnalyzer, exp. StandardTokenizer with Korean text stream, StandardTokenizer ignores the Korean characters. This is because the definition of CJK token in StandardTokenizer.jj JavaCC file doesn't have enough range covering Korean syllables described in Unicode character map.
> This patch adds one line of 0xAC00~0xD7AF, the Korean syllables range to the StandardTokenizer.jj code.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org