You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael Sokolov (Jira)" <ji...@apache.org> on 2019/11/25 17:44:00 UTC

[jira] [Comment Edited] (LUCENE-9064) Can we remove the FST cache in Kuromoji and Nori analyzers?

    [ https://issues.apache.org/jira/browse/LUCENE-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981739#comment-16981739 ] 

Michael Sokolov edited comment on LUCENE-9064 at 11/25/19 5:43 PM:
-------------------------------------------------------------------

[~bruno.roustant] there is \{TestJapaneseTokenizer.testWikipedia} (commented out). To get it running you must download jawiki from wikipedia and edit the test to point at the file you downloaded. You might also have to disable secutiry manager checks that prevent reading from random places in the filesystem.


was (Author: sokolov):
[~bruno.roustant] there is \{TestJapaneseTokenizer.testWikipedia}. To get it running you must download jawiki from wikipedia and edit the test to point at the file you downloaded. You might also have to disable secutiry manager checks that prevent reading from random places in the filesystem.

> Can we remove the FST cache in Kuromoji and Nori analyzers?
> -----------------------------------------------------------
>
>                 Key: LUCENE-9064
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9064
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Bruno Roustant
>            Priority: Minor
>
> Is the ~30k han cache in kuromoji redundant after LUCENE-8920?
> [https://github.com/apache/lucene-solr/blob/813ca77250db29116812bc949e2a466a70f969a3/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoFST.java#L35-L38])
> The entire linked file's purpose is all around this caching, so if its not needed anymore it would be a nice cleanup. But it was definitely needed for good performance before, so we shoudl be careful. Nori analyzer has the exact same thing (file has the same name) for ~10k hangul syllables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org