You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Vincent Li (JIRA)" <ji...@apache.org> on 2008/11/25 00:39:44 UTC

[jira] Created: (LUCENE-1469) isValid should be invoked after analyze rather than before it so it can validate the output of analyze

isValid should be invoked after analyze rather than before it so it can validate the output of analyze
------------------------------------------------------------------------------------------------------

                 Key: LUCENE-1469
                 URL: https://issues.apache.org/jira/browse/LUCENE-1469
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/*
    Affects Versions: 2.4
            Reporter: Vincent Li
            Priority: Minor


The Synonym map has a protected method String analyze(String word) designed for custom stemming.

However, before analyze is invoked on a word, boolean isValid(String str) is used to validate the word - which causes the program to discard words that maybe useable by the custom analyze method. 

I think that isValid should be invoked after analyze rather than before it so it can validate the output of analyze and allow implemters to decide what is valid for the overridden analyze method. (In fact, if you look at code snippet below, isValid should really go after the empty string check)

This is a two line change in org.apache.lucene.index.memory.SynonymMap

      /*
       * Part B: ignore phrases (with spaces and hyphens) and
       * non-alphabetic words, and let user customize word (e.g. do some
       * stemming)
       */
      if (!isValid(word)) continue; // ignore
      word = analyze(word);
      if (word == null || word.length() == 0) continue; // ignore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1469) isValid should be invoked after analyze rather than before it so it can validate the output of analyze

Posted by "Vincent Li (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675180#action_12675180 ] 

Vincent Li commented on LUCENE-1469:
------------------------------------

Hi Mark, sorry for the late response, I've been away for awhile. Would glady submit one. Can you point me to some info on how to submit a patch?

> isValid should be invoked after analyze rather than before it so it can validate the output of analyze
> ------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1469
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1469
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Vincent Li
>            Priority: Minor
>   Original Estimate: 0.08h
>  Remaining Estimate: 0.08h
>
> The Synonym map has a protected method String analyze(String word) designed for custom stemming.
> However, before analyze is invoked on a word, boolean isValid(String str) is used to validate the word - which causes the program to discard words that maybe useable by the custom analyze method. 
> I think that isValid should be invoked after analyze rather than before it so it can validate the output of analyze and allow implemters to decide what is valid for the overridden analyze method. (In fact, if you look at code snippet below, isValid should really go after the empty string check)
> This is a two line change in org.apache.lucene.index.memory.SynonymMap
>       /*
>        * Part B: ignore phrases (with spaces and hyphens) and
>        * non-alphabetic words, and let user customize word (e.g. do some
>        * stemming)
>        */
>       if (!isValid(word)) continue; // ignore
>       word = analyze(word);
>       if (word == null || word.length() == 0) continue; // ignore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1469) isValid should be invoked after analyze rather than before it so it can validate the output of analyze

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653285#action_12653285 ] 

Mark Miller commented on LUCENE-1469:
-------------------------------------

This makes sense to me. Care to submit a patch?

> isValid should be invoked after analyze rather than before it so it can validate the output of analyze
> ------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1469
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1469
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Vincent Li
>            Priority: Minor
>   Original Estimate: 0.08h
>  Remaining Estimate: 0.08h
>
> The Synonym map has a protected method String analyze(String word) designed for custom stemming.
> However, before analyze is invoked on a word, boolean isValid(String str) is used to validate the word - which causes the program to discard words that maybe useable by the custom analyze method. 
> I think that isValid should be invoked after analyze rather than before it so it can validate the output of analyze and allow implemters to decide what is valid for the overridden analyze method. (In fact, if you look at code snippet below, isValid should really go after the empty string check)
> This is a two line change in org.apache.lucene.index.memory.SynonymMap
>       /*
>        * Part B: ignore phrases (with spaces and hyphens) and
>        * non-alphabetic words, and let user customize word (e.g. do some
>        * stemming)
>        */
>       if (!isValid(word)) continue; // ignore
>       word = analyze(word);
>       if (word == null || word.length() == 0) continue; // ignore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1469) isValid should be invoked after analyze rather than before it so it can validate the output of analyze

Posted by "Vincent Li (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650390#action_12650390 ] 

Vincent Li commented on LUCENE-1469:
------------------------------------

On second thought - it might be a better idea to change isValid to a protected method so that it can be overridden as needed.

> isValid should be invoked after analyze rather than before it so it can validate the output of analyze
> ------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1469
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1469
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Vincent Li
>            Priority: Minor
>   Original Estimate: 0.08h
>  Remaining Estimate: 0.08h
>
> The Synonym map has a protected method String analyze(String word) designed for custom stemming.
> However, before analyze is invoked on a word, boolean isValid(String str) is used to validate the word - which causes the program to discard words that maybe useable by the custom analyze method. 
> I think that isValid should be invoked after analyze rather than before it so it can validate the output of analyze and allow implemters to decide what is valid for the overridden analyze method. (In fact, if you look at code snippet below, isValid should really go after the empty string check)
> This is a two line change in org.apache.lucene.index.memory.SynonymMap
>       /*
>        * Part B: ignore phrases (with spaces and hyphens) and
>        * non-alphabetic words, and let user customize word (e.g. do some
>        * stemming)
>        */
>       if (!isValid(word)) continue; // ignore
>       word = analyze(word);
>       if (word == null || word.length() == 0) continue; // ignore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org