You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/06/23 12:45:43 UTC

DO NOT REPLY [Bug 29756] New: - analyzer refactoring based on CVS HEAD from 6/21/2004

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=29756>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=29756

analyzer refactoring based on CVS HEAD from 6/21/2004 

           Summary: analyzer refactoring based on CVS HEAD from 6/21/2004
           Product: Lucene
           Version: CVS Nightly - Specify date in submission
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: Analysis
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: rasik.pandey@ajlsm.com


Hello,

As mentioned in previous exchanges, notably with Grant Ingersoll, I added some 
new classes to the "analysis" package to meet the requirements of the feature 
request in Bugzilla (http://issues.apache.org/bugzilla/show_bug.cgi?id=28182) 
and did some refactoring while I was under-the-hood. This is an overview of 
the hierarchies per my changes:

-Analyzer
--CustomAnalyzer (new abstract class largely based on Grant's BaseAnalyzer) --
AbstractAnalyzer (new abstract class) ---RussianAnalyzer ---GermanAnalyzer ---
etc.

-Tokenizer
--CloneableTokenizer (new abstract class)
---StandardTokenizer
---CharTokenizer
---CJKTokenizer
---etc.

-TokenFilter
--CloneableTokenFilter (new abstract class) ---AbstractStemFilter (new 
abstract class) ----RussianStemFilter ----GermanStemFilter ----etc.

-Stemmer (very simple new interface used in AbstractStemFilter) --
PorterStemmer --RussianStemmer --etc.

In the attached zip file there are 3 diff files (core.analysis, 
sandbox.analysis, and sandbox.analysis.snowball) and a zip containing the new 
classes for org.apache.lucene.analysis in the lucene core. I tried to minimize 
the irrelevant code changes (e.g. style, spaces, etc.) in the diffs while 
conforming to the code formatting guidelines outlined by Otis. I think there 
were a number of classes in the "analysis" package that didn't conform so 
these diffs may have a lot of noise as I reformatted those classe with my IDE, 
sorry :( . If the diffs are too painful then let me know and I'll try to prune 
them. 

If there is a TODO list specific to Analyzers, are the below items on that 
list?

1) move German and Russian packages to sandbox (I think this is on the Lucene 
TODO list)
2) Analyzer class renaming such that dynamic configuration could return 
classes like Analyzer_ru, Analyzer_de, Analyzer_fr, etc. based on the class 
naming scheme "Analyzer_{Locale.toString}"
3) Documentation

Question, comments, feedback, criticisms are all welcome......

Regards,
RBP

PS - Thanks Grant!

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org