You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2013/10/14 19:48:47 UTC

[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

    [ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794294#comment-13794294 ] 

Uwe Schindler edited comment on LUCENE-4956 at 10/14/13 5:46 PM:
-----------------------------------------------------------------

Hi,
I have seen the same code at a customer and found a big bug in FileUtils and JarResources. We should fix and replace this code completely. It's not platform independent. We should fix the following (in my opinion) horrible code parts:
- FileUtils: The code with getProtectionDomain is very crazy... It also will never work if the JAR file is not a local file, but maybe some other resource. Its also using APIs that are not intended for the use-case. getProtectionDomain() is for sure not to be used to get the JAR file of the classloader.
- FileUtils converts the JAR file URL (from getProtectionDomain) in a wrong way to a filesystem path: We should add URL#getPath() to the forbidden APIs, it is almost always a bug!!! The code should use toURI() and then new File(uri). The other methods in FileUtil are also having similar bugs or try to prevent them. The whole class *must* be removed, sorry!
- JarResources is some crazy caching for resources and in combination with FileUtils its just wrong. Its also does not scale if you create an uber-jar. The idea of this class is to not always open a stream again, so it loads all resources of the JAR file to memory. This is the wrong way to do. Please remove this!

We should remove both classes completely and load resources correctly with Class#getResourceAsStream.


was (Author: thetaphi):
Hi,
I have seen the same code at a customer and found a big bug in FileUtils and JarResources. We should fix and replace this code completely. It's not platform independent. We should fix the following (in my opinion) horrible code parts:
- FileUtils: The code with getProtectionDomain is very crazy... It also will never work if the JAR file is not a local file, but maybe some other resource. Its also using APIs that are not intended for the use-case. getProtectionDomain() is for sure not to be used to get the JAR file of the classloader.
- FileUtils converts the JAR file URL (from getProtectionDomain) in a wrong way to a filesystem path: We should add URL#getPath() to the forbidden APIs, it is almost always a bug!!! The code should use toURI() and the new File(uri)
- JarResources is some crazy caching for resources and in combination with FileUtils its just wrong. Its also does not scale if you create an uber-jar. The idea of this class is to not always open a stream again, so it loads all resources of the JAR file to memory. This is the wrong way to do. Please remove this!

We should remove both classes completely and load resources correctly with Class#getResourceAsStream.

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>            Assignee: Christian Moen
>              Labels: newbie
>         Attachments: kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch
>
>
> Korean language has specific characteristic. When developing search service with lucene & solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org