You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2010/04/06 03:40:27 UTC

[jira] Created: (LUCENE-2368) stopword files should be versioned; acessor for default(s) should take a Version property

stopword files should be versioned; acessor for default(s) should take a Version property
-----------------------------------------------------------------------------------------

                 Key: LUCENE-2368
                 URL: https://issues.apache.org/jira/browse/LUCENE-2368
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Analysis
            Reporter: Hoss Man
             Fix For: 2.3.3


The existing language specific stopword files on the trunk have no version info in their filenames -- this will make it awkward/confusing to update them as time goes on.  LIkewise, many classes have a "getDefaultStopSet()" which makes these methods (when called by client code) suffer from the same API back-compat issues that the Analyzers themselves did before we added Version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2368) stopword files should be versioned; acessor for default(s) should take a Version property

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853691#action_12853691 ] 

Hoss Man commented on LUCENE-2368:
----------------------------------

bq. I wonder if we should just break now (by renaming these 3) and version all the files so its clean.

I didn't realize we even had those.

The other option is to *not* rename any of the files, but clearly document what the naming convention is coming forward -- as i mentioned in the comment i just added (with more details beyond the summary description) the names don't have to match Lucene Version semantics ... they just have to be something that is unique moving forward.  Specificly: we should never modify the contents of the files, we should just add a new file and "deprecate" the old file.

but the naming convention could easily be...

stopwords_esperanto.txt
stopwords_esperanto_2.txt
stopwords_esperanto_3.txt
...



> stopword files should be versioned; acessor for default(s) should take a Version property
> -----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2368
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2368
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Hoss Man
>             Fix For: 2.3.3
>
>
> The existing language specific stopword files on the trunk have no version info in their filenames -- this will make it awkward/confusing to update them as time goes on.  LIkewise, many classes have a "getDefaultStopSet()" which makes these methods (when called by client code) suffer from the same API back-compat issues that the Analyzers themselves did before we added Version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2368) stopword files should be versioned; acessor for default(s) should take a Version property

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853686#action_12853686 ] 

Robert Muir commented on LUCENE-2368:
-------------------------------------

Yes, we should just go ahead and do this now to prevent future confusion and heartache.

Before this version there were only 3 stopwords lists as files in lucene, (ar, fa, smartcn). 
I wonder if we should just break now (by renaming these 3) and version all the files so its clean.


> stopword files should be versioned; acessor for default(s) should take a Version property
> -----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2368
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2368
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Hoss Man
>             Fix For: 2.3.3
>
>
> The existing language specific stopword files on the trunk have no version info in their filenames -- this will make it awkward/confusing to update them as time goes on.  LIkewise, many classes have a "getDefaultStopSet()" which makes these methods (when called by client code) suffer from the same API back-compat issues that the Analyzers themselves did before we added Version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2368) stopword files should be versioned; acessor for default(s) should take a Version property

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853688#action_12853688 ] 

Hoss Man commented on LUCENE-2368:
----------------------------------

This is something i brought up with Robert on IRC a few days ago, and forgot to file an issue for...

* We should make all the langauge specific stopword files have something in their name that identifies them so we can add newer versions of them over time with distiguished names.  The simplest convention moving forward would probably be to name the file after the first Lucene version it was added in (ie: "russian_stop_3_3.txt") but there is no reason why the names have to directly corrispond to the Lucene Version -- they could just as easily have completely sequential names (ie: "russian_stop_001.txt" or "russian_stop_AAA.txt"). 

* All of the static "getDefaultStopSet()" methods in all of the various Analyzers should be changed to take in a Version param which picks the appropriate file (or staticly compiled set) based on the param.  Any Analyzer that already has Version based stopword switching logic in it's constructor should instead just delegate to the getDefaultStopSet() method.



> stopword files should be versioned; acessor for default(s) should take a Version property
> -----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2368
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2368
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Hoss Man
>             Fix For: 2.3.3
>
>
> The existing language specific stopword files on the trunk have no version info in their filenames -- this will make it awkward/confusing to update them as time goes on.  LIkewise, many classes have a "getDefaultStopSet()" which makes these methods (when called by client code) suffer from the same API back-compat issues that the Analyzers themselves did before we added Version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org