You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Markus Fischer <ma...@fischer.name> on 2006/01/31 13:49:46 UTC

Stemming german words

Hi,

I'm currently using the GermanStemmer and it works well. However today 
I've found two words which get stemmed to the same stemm-word.

"Suche" and "Sucht" both get stemmed to the same "such" it seems, 
however they've completely different meanings in german (Suche = the 
Search, Sucht => addicttion).

Is there a way to tune the stemmer or are there alternatives available 
or should I look for another stemmer for the german language?

thanks for any pointers,
- Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Stemming german words

Posted by Markus Fischer <ma...@fischer.name>.
Jonathan,

what should I say, I'm feeling like an idiot now. Of course you're 
right. This actually solves the issue ;)

thanks and sorry for wasting time,
- Markus

Jonathan O'Connor wrote:
> Markus,
> As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. 
> "er sucht etwas". Sadly, you may be able to fix this one problem, but 
> there will be hundreds of other problems too. Stemmers are never 
> perfect. You just have to live with it.
> 
> Most users won't have a problem with that. If they want want to search 
> for addiction, then they will probably add "drug" or "alcohol", etc... 
> to the search.
> Ciao,
> Jonathan O'Connor
> XCOM Dublin
> Inactive hide details for Markus Fischer <ma...@fischer.name>Markus 
> Fischer <ma...@fischer.name>
> 
> 
>                         *Markus Fischer <ma...@fischer.name>*
> 
>                         31/01/2006 12:49
>                         Please respond to
>                         java-user@lucene.apache.org
> 
> 	
> 
> To
> 	
> java-user@lucene.apache.org
> 
> cc
> 	
> 
> Subject
> 	
> Stemming german words
> 
> 	
> 
> 
> Hi,
> 
> I'm currently using the GermanStemmer and it works well. However today
> I've found two words which get stemmed to the same stemm-word.
> 
> "Suche" and "Sucht" both get stemmed to the same "such" it seems,
> however they've completely different meanings in german (Suche = the
> Search, Sucht => addicttion).
> 
> Is there a way to tune the stemmer or are there alternatives available
> or should I look for another stemmer for the german language?
> 
> thanks for any pointers,
> - Markus
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> *** XCOM AG Legal Disclaimer ***
> 
> Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist 
> allein für den Gebrauch durch den vorgesehenen Empfaenger bestimmt. 
> Dritten ist das Lesen, Verteilen oder Weiterleiten dieser E-Mail 
> untersagt. Wir bitten, eine fehlgeleitete E-Mail unverzueglich 
> vollstaendig zu loeschen und uns eine Nachricht zukommen zu lassen.
> 
> This email may contain material that is confidential and for the sole 
> use of the intended recipient. Any review, distribution by others or 
> forwarding without express permission is strictly prohibited. If you are 
> not the intended recipient, please contact the sender and delete all copies.
> 
> Hauptsitz: Bahnstrasse 33, D-47877 Willich, USt-IdNr.: DE 812 885 664
> Kommunikation: Telefon +49 2154 9209-70, Telefax +49 2154 9209-900, 
> www.xcom.de
> Handelsregister: Amtsgericht Krefeld, HRB 10340
> Vorstand: Matthias Albrecht, Renate Becker-Grope, Marco Marty
> Vorsitzender des Aufsichtsrates: Stephan Steuer
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Stemming german words

Posted by Stefan Gusenbauer <st...@kbse.net>.
Jonathan O'Connor wrote:

> Markus,
> As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. 
> "er sucht etwas". Sadly, you may be able to fix this one problem, but 
> there will be hundreds of other problems too. Stemmers are never 
> perfect. You just have to live with it.
>
> Most users won't have a problem with that. If they want want to search 
> for addiction, then they will probably add "drug" or "alcohol", etc... 
> to the search.
> Ciao,
> Jonathan O'Connor
> XCOM Dublin
> Inactive hide details for Markus Fischer <ma...@fischer.name>Markus 
> Fischer <ma...@fischer.name>
>
>
>                         *Markus Fischer <ma...@fischer.name>*
>
>                         31/01/2006 12:49
>                         Please respond to
>                         java-user@lucene.apache.org
>
> 	
>
> To
> 	
> java-user@lucene.apache.org
>
> cc
> 	
>
> Subject
> 	
> Stemming german words
>
> 	
>
>
> Hi,
>
> I'm currently using the GermanStemmer and it works well. However today
> I've found two words which get stemmed to the same stemm-word.
>
> "Suche" and "Sucht" both get stemmed to the same "such" it seems,
> however they've completely different meanings in german (Suche = the
> Search, Sucht => addicttion).
>
> Is there a way to tune the stemmer or are there alternatives available
> or should I look for another stemmer for the german language?
>
> thanks for any pointers,
> - Markus
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> *** XCOM AG Legal Disclaimer ***
>
> Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist 
> allein für den Gebrauch durch den vorgesehenen Empfaenger bestimmt. 
> Dritten ist das Lesen, Verteilen oder Weiterleiten dieser E-Mail 
> untersagt. Wir bitten, eine fehlgeleitete E-Mail unverzueglich 
> vollstaendig zu loeschen und uns eine Nachricht zukommen zu lassen.
>
> This email may contain material that is confidential and for the sole 
> use of the intended recipient. Any review, distribution by others or 
> forwarding without express permission is strictly prohibited. If you 
> are not the intended recipient, please contact the sender and delete 
> all copies.
>
> Hauptsitz: Bahnstrasse 33, D-47877 Willich, USt-IdNr.: DE 812 885 664
> Kommunikation: Telefon +49 2154 9209-70, Telefax +49 2154 9209-900, 
> www.xcom.de
> Handelsregister: Amtsgericht Krefeld, HRB 10340
> Vorstand: Matthias Albrecht, Renate Becker-Grope, Marco Marty
> Vorsitzender des Aufsichtsrates: Stephan Steuer
>
You could try a POS-Tagger to check if the word is used as a noun and 
don't stem it therefore. But It would be interesting if a POS-Tagger can 
distinguish between "Sucht" as nound and "sucht" as verb. But you could 
give this a try.
stefan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Stemming german words

Posted by Jonathan O'Connor <Jo...@xcom.de>.
Markus,
As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. "er
sucht etwas". Sadly, you may be able to fix this one problem, but there
will be hundreds of other problems too. Stemmers are never perfect. You
just have to live with it.

Most users won't have a problem with that. If they want want to search for
addiction, then they will probably add "drug" or "alcohol", etc... to the
search.
Ciao,
Jonathan O'Connor
XCOM Dublin


                                                                           
             Markus Fischer                                                
             <markus@fischer.n                                             
             ame>                                                       To 
                                       java-user@lucene.apache.org         
             31/01/2006 12:49                                           cc 
                                                                           
                                                                   Subject 
             Please respond to         Stemming german words               
             java-user@lucene.                                             
                apache.org                                                 
                                                                           
                                                                           
                                                                           
                                                                           




Hi,

I'm currently using the GermanStemmer and it works well. However today
I've found two words which get stemmed to the same stemm-word.

"Suche" and "Sucht" both get stemmed to the same "such" it seems,
however they've completely different meanings in german (Suche = the
Search, Sucht => addicttion).

Is there a way to tune the stemmer or are there alternatives available
or should I look for another stemmer for the german language?

thanks for any pointers,
- Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





*** XCOM AG Legal Disclaimer ***

Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist allein
für den Gebrauch durch den vorgesehenen Empfaenger bestimmt. Dritten ist
das Lesen, Verteilen oder Weiterleiten dieser E-Mail untersagt. Wir bitten,
eine fehlgeleitete E-Mail unverzueglich vollstaendig zu loeschen und uns
eine Nachricht zukommen zu lassen.

This email may contain material that is confidential and for the sole use
of the intended recipient. Any review, distribution by others or forwarding
without express permission is strictly prohibited. If you are not the
intended recipient, please contact the sender and delete all copies.

Hauptsitz: Bahnstrasse 33, D-47877 Willich, USt-IdNr.: DE 812 885 664
Kommunikation: Telefon +49 2154 9209-70, Telefax +49 2154 9209-900,
www.xcom.de
Handelsregister: Amtsgericht Krefeld, HRB 10340
Vorstand: Matthias Albrecht, Renate Becker-Grope, Marco Marty
Vorsitzender des Aufsichtsrates: Stephan Steuer