You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Christian Moen (JIRA)" <ji...@apache.org> on 2013/08/14 08:51:50 UTC

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

    [ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739301#comment-13739301 ] 

Christian Moen commented on LUCENE-4956:
----------------------------------------

I've now aligned the branch with {{trunk}}, updated the example {{schema.xml}} to use {{text_ko}} naming for the Korean field type.

I've also indexed Korean Wikipedia continuously for a few hours and the JVM heap looks fine.

There are several additional things that can be done with this code, including generating the parser using JFlex at build time, fixing some of the position issues with random-blasting, cleanups and dead-code removal, etc.  This said, I believe the code we have is useful to Korean users as-is and I'm thinking it's a good idea to integrate it into {{trunk}} and iterate further from there.

Please share your thoughts.  Thanks.

                
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>            Assignee: Christian Moen
>              Labels: newbie
>         Attachments: kr.analyzer.4x.tar, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service with lucene & solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org