You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by "Shad Storhaug (JIRA)" <ji...@apache.org> on 2017/07/23 23:40:00 UTC

[jira] [Updated] (LUCENENET-567) Port Lucene.Net.Analysis.Kuromoji

     [ https://issues.apache.org/jira/browse/LUCENENET-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shad Storhaug updated LUCENENET-567:
------------------------------------
    Attachment: mecab-ipadic-2.7.0-20070801.tar.gz

I posted a comment here: https://issues.apache.org/jira/browse/LUCENE-3305?focusedCommentId=16097465&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16097465 and also contacted the Kuromoji project owners to see if they could help out. However, so far received no response.

Fortunately, I was able to find [this blog post](http://mentaldetritus.blogspot.com/2013/03/compiling-custom-dictionary-for.html) that links to some files to use to check the I/O code so it doesn't just blow up (attached).

I used this data to create a smoke test. Hopefully, someday the Kuromoji team will add some real tests to Lucene so we can verify automatically instead of manually that the binary format works.

I also modified the way the files are loaded so they can be overridden by dropping them into a subdirectory of the application named {{kuromoji-data}}. If that directory exists, the files will be loaded from it instead of the embedded resources. This is better than the option that Lucene provided, which requires you to recompile the assembly in order to change the dictionary.

> Port Lucene.Net.Analysis.Kuromoji
> ---------------------------------
>
>                 Key: LUCENENET-567
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-567
>             Project: Lucene.Net
>          Issue Type: Task
>          Components: Lucene.Net.Analysis.Kuromoji
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Shad Storhaug
>            Assignee: Shad Storhaug
>            Priority: Minor
>              Labels: features
>             Fix For: Lucene.Net 4.8.0
>
>         Attachments: mecab-ipadic-2.7.0-20070801.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Support for Analysis.Kuromoji has been added already to the ByteBuffer in the Support namespace



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)