You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hivemall.apache.org by "Satoshi Iijima (JIRA)" <ji...@apache.org> on 2018/06/27 07:03:00 UTC

[jira] [Created] (HIVEMALL-208) tokenize_ja failed to analyze certain Japanese strings

Satoshi Iijima created HIVEMALL-208:
---------------------------------------

             Summary: tokenize_ja failed to analyze certain Japanese strings
                 Key: HIVEMALL-208
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-208
             Project: Hivemall
          Issue Type: Bug
    Affects Versions: 0.5.0
            Reporter: Satoshi Iijima


tokenize_ja failed to analyze certain Japanese strings and outputed below error.
{panel}
java.lang.ArrayIndexOutOfBoundsException: -1
 at org.apache.lucene.analysis.ja.JapaneseTokenizer.backtrace(JapaneseTokenizer.java:1024)
 at org.apache.lucene.analysis.ja.JapaneseTokenizer.parse(JapaneseTokenizer.java:873)
 at org.apache.lucene.analysis.ja.JapaneseTokenizer.incrementToken(JapaneseTokenizer.java:474)
 at org.apache.lucene.analysis.ja.JapaneseBaseFormFilter.incrementToken(JapaneseBaseFormFilter.java:50)
 at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
 at org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63)
 at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
 at org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.incrementToken(JapaneseKatakanaStemFilter.java:63)
 at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:45)
 at hivemall.nlp.tokenizer.KuromojiUDF.analyzeTokens(KuromojiUDF.java:292)
 at hivemall.nlp.tokenizer.KuromojiUDF.evaluate(KuromojiUDF.java:117)
{panel}
This cause is LUCENE-7279 which has already fixed. Lucene need to be upgraded.
 Affected versions are not only v0.5.0 but also v0.4.2.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)