You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2003/10/09 16:17:57 UTC

GermanAnalyzer.java GermanStemmer.java

Moving to lucene-user list.

If not the author, maybe some users of this code can tell us how this
uppercase/lowercase business should work.

And the issue even includes patches.  I don't use the German* stuff, so
I'm afraid of applying it and breaking things for people who do use
German* classes as they are currently.

Otis

--- Erik Hatcher <er...@ehatchersolutions.com> wrote:
> It seems to be the issue mentioned here as well:
> 
> 	http://nagoya.apache.org/bugzilla/show_bug.cgi?id=18410
> 
> 
> On Wednesday, October 8, 2003, at 09:41  PM, Otis Gospodnetic wrote:
> > Answer to question comment: possibly because nouns start with a
> capital
> > letter in German, so lowercasing may not be the right thing to do.
> > This is a bit of a guess.  Maybe the author will enlighten us. :)
> >
> > Otis
> >
> > --- ehatcher@apache.org wrote:
> >> ehatcher    2003/10/08 17:08:52
> >>   Revision  Changes    Path
> >>   1.7       +3 -2
> >>
> > jakarta-lucene/src/java/org/apache/lucene/analysis/de/ 
> > GermanAnalyzer.java
> >>
> >>   Index: GermanAnalyzer.java
> >>  
> ===================================================================
> >>   RCS file:
> >>
> > /home/cvs/jakarta-lucene/src/java/org/apache/lucene/analysis/de/ 
> > GermanAnalyzer.java,v
> >>   retrieving revision 1.6
> >>   retrieving revision 1.7
> >>   diff -u -r1.6 -r1.7
> >>   --- GermanAnalyzer.java	29 Jan 2003 17:18:53 -0000	1.6
> >>   +++ GermanAnalyzer.java	9 Oct 2003 00:08:52 -0000	1.7
> >>   @@ -169,7 +169,8 @@
> >>        {
> >>    	TokenStream result = new StandardTokenizer( reader );
> >>    	result = new StandardFilter( result );
> >>   -	result = new StopFilter( result, stoptable );
> >>   +  // shouldn't there be a lowercaser before stop word
> filtering?
> >>   +  result = new StopFilter( result, stoptable );
> >>    	result = new GermanStemFilter( result, excltable );
> >>    	return result;
> >>        }
> >>
> >>
> >>



__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: GermanAnalyzer.java GermanStemmer.java

Posted by Gerhard Schwarz <ge...@fpg.de>.

Hello,

Otis Gospodnetic schrieb:
> If not the author, maybe some users of this code can tell us how this
> uppercase/lowercase business should work.

Lowercasefilter before Stopfilter would be fine. The only reason not to 
do so was a semantic issue. But that issue is not important for the use 
with Lucene. The stemmer was written for an application I use, for the 
use with lucene (for normal fulltext inquiry at all) I can be simpler.

> And the issue even includes patches.  I don't use the German* stuff, so
> I'm afraid of applying it and breaking things for people who do use
> German* classes as they are currently.

After months of work I now have spare time.
The patch I promised months ago comes by next week. (No promise this 
time, or tomorrow something bad will happen in my office...)

I hope that the updated stemmer will fully work with existing indices.


Gerhard (back on Java development)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org