You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/04/22 22:07:51 UTC
[jira] Resolved: (LUCENE-1032) CJKAnalyzer should convert half
width katakana to full width katakana
[ https://issues.apache.org/jira/browse/LUCENE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-1032.
---------------------------------
Resolution: Duplicate
Normalization is implemented in LUCENE-2399
> CJKAnalyzer should convert half width katakana to full width katakana
> ---------------------------------------------------------------------
>
> Key: LUCENE-1032
> URL: https://issues.apache.org/jira/browse/LUCENE-1032
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Affects Versions: 2.0.0
> Reporter: Andrew Lynch
>
> Some of our Japanese customers are reporting errors when performing searches using half width characters.
> The desired behavior is that a document containing half width characters should be returned when performing a search using full width equivalents or when searching by the half width character itself.
> Currently, a search will not return any matches for half width characters.
> Here is a test case outlining desired behavior (this may require a new Analyzer).
> {code}
> public class TestJapaneseEncodings extends TestCase
> {
> byte[] fullWidthKa = new byte[]{(byte) 0xE3, (byte) 0x82, (byte) 0xAB};
> byte[] halfWidthKa = new byte[]{(byte) 0xEF, (byte) 0xBD, (byte) 0xB6};
> public void testAnalyzerWithHalfWidth() throws IOException
> {
> Reader r1 = new StringReader(makeHalfWidthKa());
> TokenStream stream = new CJKAnalyzer().tokenStream("foo", r1);
> assertNotNull(stream);
> Token token = stream.next();
> assertNotNull(token);
> assertEquals(makeFullWidthKa(), token.termText());
> }
> public void testAnalyzerWithFullWidth() throws IOException
> {
> Reader r1 = new StringReader(makeFullWidthKa());
> TokenStream stream = new CJKAnalyzer().tokenStream("foo", r1);
> assertEquals(makeFullWidthKa(), stream.next().termText());
> }
> private String makeFullWidthKa() throws UnsupportedEncodingException
> {
> return new String(fullWidthKa, "UTF-8");
> }
> private String makeHalfWidthKa() throws UnsupportedEncodingException
> {
> return new String(halfWidthKa, "UTF-8");
> }
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org