You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2008/05/14 07:39:55 UTC
[jira] Resolved: (LUCENE-1003) [PATCH] RussianAnalyzer's tokenizer
skips numbers from input text,
[ https://issues.apache.org/jira/browse/LUCENE-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Otis Gospodnetic resolved LUCENE-1003.
--------------------------------------
Resolution: Fixed
> [PATCH] RussianAnalyzer's tokenizer skips numbers from input text,
> ------------------------------------------------------------------
>
> Key: LUCENE-1003
> URL: https://issues.apache.org/jira/browse/LUCENE-1003
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 2.2
> Reporter: TUSUR OpenTeam
> Assignee: Otis Gospodnetic
> Attachments: RussianCharsets.java.patch, TestRussianAnalyzer.java.patch
>
>
> RussianAnalyzer's tokenizer skips numbers from input text, so that resulting token stream miss numbers. Problem can be solved by adding numbers to RussianCharsets.UnicodeRussian. See test case below for details.
> {code:title=TestRussianAnalyzer.java|borderStyle=solid}
> public class TestRussianAnalyzer extends TestCase {
> Reader reader = new StringReader("text 1000");
> // test FAILS
> public void testStemmer() {
> testAnalyzer(new RussianAnalyzer());
> }
> // test PASSES
> public void testFixedRussianAnalyzer() {
> testAnalyzer(new RussianAnalyzer(getRussianCharSet()));
> }
> private void testAnalyzer(RussianAnalyzer analyzer) {
> try {
> TokenStream stream = analyzer.tokenStream("text", reader);
> assertEquals("text", stream.next().termText());
> assertNotNull(stream.next());
> } catch (IOException e) {
> fail(e.getMessage());
> }
> }
> private char[] getRussianCharSet() {
> int length = RussianCharsets.UnicodeRussian.length;
> final char[] russianChars = new char[length + 10];
> System
> .arraycopy(RussianCharsets.UnicodeRussian, 0, russianChars, 0, length);
> russianChars[length++] = '0';
> russianChars[length++] = '1';
> russianChars[length++] = '2';
> russianChars[length++] = '3';
> russianChars[length++] = '4';
> russianChars[length++] = '5';
> russianChars[length++] = '6';
> russianChars[length++] = '7';
> russianChars[length++] = '8';
> russianChars[length] = '9';
> return russianChars;
> }
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org