You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Uihyun Kim (Jira)" <ji...@apache.org> on 2022/02/09 15:38:00 UTC

[jira] [Created] (LUCENE-10416) Update Korean Dictionary for Nori

Uihyun Kim created LUCENE-10416:
-----------------------------------

             Summary: Update Korean Dictionary for Nori
                 Key: LUCENE-10416
                 URL: https://issues.apache.org/jira/browse/LUCENE-10416
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Uihyun Kim


For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, which is available under an Apache license here: [https://bitbucket.org/eunjeon/mecab-ko-dic]

 

The dictionary hasn't been updated in Nori although it has some updates to provide better analysis results. Downloading is available here: [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads]
 * Currently used in Nori: [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
 * Latest: [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]

 

There are changes between the currently used version and the latest release version(change log: [https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/CHANGES.md])
 * New feature: added semantic class for NNG - 장소, 행위, 상태변화, 정적상태
 * Fix: correct unexpectedly huge cost on NNG/장소
 * New words

 

There's no issue with testing :lucene:analysis:nori:test and building a new binary.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org