You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Satoshi Iijima (JIRA)" <ji...@apache.org> on 2018/06/27 07:03:00 UTC
[jira] [Created] (HIVEMALL-208) tokenize_ja failed to analyze
certain Japanese strings
Satoshi Iijima created HIVEMALL-208:
---------------------------------------
Summary: tokenize_ja failed to analyze certain Japanese strings
Key: HIVEMALL-208
URL: https://issues.apache.org/jira/browse/HIVEMALL-208
Project: Hivemall
Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Satoshi Iijima
tokenize_ja failed to analyze certain Japanese strings and outputed below error.
{panel}
java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.lucene.analysis.ja.JapaneseTokenizer.backtrace(JapaneseTokenizer.java:1024)
at org.apache.lucene.analysis.ja.JapaneseTokenizer.parse(JapaneseTokenizer.java:873)
at org.apache.lucene.analysis.ja.JapaneseTokenizer.incrementToken(JapaneseTokenizer.java:474)
at org.apache.lucene.analysis.ja.JapaneseBaseFormFilter.incrementToken(JapaneseBaseFormFilter.java:50)
at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
at org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63)
at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
at org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.incrementToken(JapaneseKatakanaStemFilter.java:63)
at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:45)
at hivemall.nlp.tokenizer.KuromojiUDF.analyzeTokens(KuromojiUDF.java:292)
at hivemall.nlp.tokenizer.KuromojiUDF.evaluate(KuromojiUDF.java:117)
{panel}
This cause is LUCENE-7279 which has already fixed. Lucene need to be upgraded.
Affected versions are not only v0.5.0 but also v0.4.2.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)