You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2015/10/08 09:46:26 UTC

[jira] [Updated] (LUCENE-6833) Upgrade morfologik to version 2.0.1, simplify MorfologikFilter's dictionary lookup

     [ https://issues.apache.org/jira/browse/LUCENE-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated LUCENE-6833:
--------------------------------
    Description: 
This is a follow-up to Uwe's work on LUCENE-6774. 

This patch updates the code to use Morfologik stemming version 2.0.1, which removes the "automatic" lookup of classpath-relative dictionary resources in favor of an explicit InputStream or URL. So the user code is explicitly responsible to provide these resources, reacting to missing files, etc.

There were no other "default" dictionaries in Morfologik other than the Polish dictionary so I also cleaned up the filter code from a number of attributes that were, to me, confusing. 

* {{MorfologikFilterFactory}} now accepts an (optional) {{dictionary}} attribute which contains an explicit name of the dictionary resource to load. The resource is loaded with a {{ResourceLoader}} passed to the {{inform(..)}} method, so the final location depends on the resource loader.
* There is no way to load the dictionary and metadata separately (this isn't at all useful).
* If the {{dictionary}} attribute is missing, the filter loads the Polish dictionary by default (since most people would be using Morfologik for stemming Polish anyway).

This patch is *not* backward compatible, but it attempts to provide useful feedback on initialization: if the removed attributes were used, it points at this JIRA issue, so it should be clear what to change and how.

  was:
This is a follow-up to Uwe's work on LUCENE-6774. 

This patch updates the code to use Morfologik stemming version 2.0.1, which removes the "automatic" lookup of classpath-relative dictionary resources in favor of an explicit InputStream or URL. So the user code is explicitly responsible to provide these resources, reacting to missing files, etc.

There were no other "default" dictionaries in Morfologik other than the Polish dictionary so I also cleaned up the filter code from a number of attributes that were, to me, confusing. 

* {{MorfologikFilterFactory}} now accepts an (optional) {{dictionary}} attribute which contains an explicit name of the dictionary resource to load. The resource is loaded with a {{ResourceLoader}} passed to the {{inform(..)}} method, so the final location depends on the resource loader.
* There is no way to load the dictionary and metadata separately (this isn't at all useful).
* If the {{dictionary}} attribute is missing, the filter loads the Polish dictionary by default (since most people would be using Morfologik for stemming Polish anyway).


> Upgrade morfologik to version 2.0.1, simplify MorfologikFilter's dictionary lookup
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-6833
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6833
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: Trunk
>
>
> This is a follow-up to Uwe's work on LUCENE-6774. 
> This patch updates the code to use Morfologik stemming version 2.0.1, which removes the "automatic" lookup of classpath-relative dictionary resources in favor of an explicit InputStream or URL. So the user code is explicitly responsible to provide these resources, reacting to missing files, etc.
> There were no other "default" dictionaries in Morfologik other than the Polish dictionary so I also cleaned up the filter code from a number of attributes that were, to me, confusing. 
> * {{MorfologikFilterFactory}} now accepts an (optional) {{dictionary}} attribute which contains an explicit name of the dictionary resource to load. The resource is loaded with a {{ResourceLoader}} passed to the {{inform(..)}} method, so the final location depends on the resource loader.
> * There is no way to load the dictionary and metadata separately (this isn't at all useful).
> * If the {{dictionary}} attribute is missing, the filter loads the Polish dictionary by default (since most people would be using Morfologik for stemming Polish anyway).
> This patch is *not* backward compatible, but it attempts to provide useful feedback on initialization: if the removed attributes were used, it points at this JIRA issue, so it should be clear what to change and how.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org