You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2013/03/11 11:45:22 UTC

[jira] [Created] (STANBOL-980) Add Japanese Language support by using the Solr/Lucene Kuromoji Analyzer

Rupert Westenthaler created STANBOL-980:
-------------------------------------------

             Summary: Add Japanese Language support by using the Solr/Lucene Kuromoji Analyzer
                 Key: STANBOL-980
                 URL: https://issues.apache.org/jira/browse/STANBOL-980
             Project: Stanbol
          Issue Type: New Feature
          Components: Commons, Enhancer
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


With the most recent Solr/Lucene versions the Kuromoji Analyzer for Japanese was added. This module will allow to

* index and search Entities with Japanese language labels and texts
* Tokenize Japanese Text
* POS tagging of Japanese Text
* NER for Persons, Organizations and Places
* Lemmatization
* Correct Label Tokenization required for linking Japanese labels of Entities

This will required three modules:

* extension to the commons.solr.core module that provide the Kuromoji Analyzer as Bundle
* NLP processing Engine
* LabelTokenizer implementation

In addition an own bundlelist that includes those three modules. This Bundlelist should be added by default to the Full Stanbol Launcher.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira