You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Markus Fischer <ma...@fischer.name> on 2006/01/31 13:49:46 UTC
Stemming german words
Hi,
I'm currently using the GermanStemmer and it works well. However today
I've found two words which get stemmed to the same stemm-word.
"Suche" and "Sucht" both get stemmed to the same "such" it seems,
however they've completely different meanings in german (Suche = the
Search, Sucht => addicttion).
Is there a way to tune the stemmer or are there alternatives available
or should I look for another stemmer for the german language?
thanks for any pointers,
- Markus
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Stemming german words
Posted by Markus Fischer <ma...@fischer.name>.
Jonathan,
what should I say, I'm feeling like an idiot now. Of course you're
right. This actually solves the issue ;)
thanks and sorry for wasting time,
- Markus
Jonathan O'Connor wrote:
> Markus,
> As I'm sure you know, "sucht" is also an inflection of "suchen", e.g.
> "er sucht etwas". Sadly, you may be able to fix this one problem, but
> there will be hundreds of other problems too. Stemmers are never
> perfect. You just have to live with it.
>
> Most users won't have a problem with that. If they want want to search
> for addiction, then they will probably add "drug" or "alcohol", etc...
> to the search.
> Ciao,
> Jonathan O'Connor
> XCOM Dublin
> Inactive hide details for Markus Fischer <ma...@fischer.name>Markus
> Fischer <ma...@fischer.name>
>
>
> *Markus Fischer <ma...@fischer.name>*
>
> 31/01/2006 12:49
> Please respond to
> java-user@lucene.apache.org
>
>
>
> To
>
> java-user@lucene.apache.org
>
> cc
>
>
> Subject
>
> Stemming german words
>
>
>
>
> Hi,
>
> I'm currently using the GermanStemmer and it works well. However today
> I've found two words which get stemmed to the same stemm-word.
>
> "Suche" and "Sucht" both get stemmed to the same "such" it seems,
> however they've completely different meanings in german (Suche = the
> Search, Sucht => addicttion).
>
> Is there a way to tune the stemmer or are there alternatives available
> or should I look for another stemmer for the german language?
>
> thanks for any pointers,
> - Markus
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> *** XCOM AG Legal Disclaimer ***
>
> Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist
> allein für den Gebrauch durch den vorgesehenen Empfaenger bestimmt.
> Dritten ist das Lesen, Verteilen oder Weiterleiten dieser E-Mail
> untersagt. Wir bitten, eine fehlgeleitete E-Mail unverzueglich
> vollstaendig zu loeschen und uns eine Nachricht zukommen zu lassen.
>
> This email may contain material that is confidential and for the sole
> use of the intended recipient. Any review, distribution by others or
> forwarding without express permission is strictly prohibited. If you are
> not the intended recipient, please contact the sender and delete all copies.
>
> Hauptsitz: Bahnstrasse 33, D-47877 Willich, USt-IdNr.: DE 812 885 664
> Kommunikation: Telefon +49 2154 9209-70, Telefax +49 2154 9209-900,
> www.xcom.de
> Handelsregister: Amtsgericht Krefeld, HRB 10340
> Vorstand: Matthias Albrecht, Renate Becker-Grope, Marco Marty
> Vorsitzender des Aufsichtsrates: Stephan Steuer
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Stemming german words
Posted by Stefan Gusenbauer <st...@kbse.net>.
Jonathan O'Connor wrote:
> Markus,
> As I'm sure you know, "sucht" is also an inflection of "suchen", e.g.
> "er sucht etwas". Sadly, you may be able to fix this one problem, but
> there will be hundreds of other problems too. Stemmers are never
> perfect. You just have to live with it.
>
> Most users won't have a problem with that. If they want want to search
> for addiction, then they will probably add "drug" or "alcohol", etc...
> to the search.
> Ciao,
> Jonathan O'Connor
> XCOM Dublin
> Inactive hide details for Markus Fischer <ma...@fischer.name>Markus
> Fischer <ma...@fischer.name>
>
>
> *Markus Fischer <ma...@fischer.name>*
>
> 31/01/2006 12:49
> Please respond to
> java-user@lucene.apache.org
>
>
>
> To
>
> java-user@lucene.apache.org
>
> cc
>
>
> Subject
>
> Stemming german words
>
>
>
>
> Hi,
>
> I'm currently using the GermanStemmer and it works well. However today
> I've found two words which get stemmed to the same stemm-word.
>
> "Suche" and "Sucht" both get stemmed to the same "such" it seems,
> however they've completely different meanings in german (Suche = the
> Search, Sucht => addicttion).
>
> Is there a way to tune the stemmer or are there alternatives available
> or should I look for another stemmer for the german language?
>
> thanks for any pointers,
> - Markus
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> *** XCOM AG Legal Disclaimer ***
>
> Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist
> allein für den Gebrauch durch den vorgesehenen Empfaenger bestimmt.
> Dritten ist das Lesen, Verteilen oder Weiterleiten dieser E-Mail
> untersagt. Wir bitten, eine fehlgeleitete E-Mail unverzueglich
> vollstaendig zu loeschen und uns eine Nachricht zukommen zu lassen.
>
> This email may contain material that is confidential and for the sole
> use of the intended recipient. Any review, distribution by others or
> forwarding without express permission is strictly prohibited. If you
> are not the intended recipient, please contact the sender and delete
> all copies.
>
> Hauptsitz: Bahnstrasse 33, D-47877 Willich, USt-IdNr.: DE 812 885 664
> Kommunikation: Telefon +49 2154 9209-70, Telefax +49 2154 9209-900,
> www.xcom.de
> Handelsregister: Amtsgericht Krefeld, HRB 10340
> Vorstand: Matthias Albrecht, Renate Becker-Grope, Marco Marty
> Vorsitzender des Aufsichtsrates: Stephan Steuer
>
You could try a POS-Tagger to check if the word is used as a noun and
don't stem it therefore. But It would be interesting if a POS-Tagger can
distinguish between "Sucht" as nound and "sucht" as verb. But you could
give this a try.
stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Stemming german words
Posted by Jonathan O'Connor <Jo...@xcom.de>.
Markus,
As I'm sure you know, "sucht" is also an inflection of "suchen", e.g. "er
sucht etwas". Sadly, you may be able to fix this one problem, but there
will be hundreds of other problems too. Stemmers are never perfect. You
just have to live with it.
Most users won't have a problem with that. If they want want to search for
addiction, then they will probably add "drug" or "alcohol", etc... to the
search.
Ciao,
Jonathan O'Connor
XCOM Dublin
Markus Fischer
<markus@fischer.n
ame> To
java-user@lucene.apache.org
31/01/2006 12:49 cc
Subject
Please respond to Stemming german words
java-user@lucene.
apache.org
Hi,
I'm currently using the GermanStemmer and it works well. However today
I've found two words which get stemmed to the same stemm-word.
"Suche" and "Sucht" both get stemmed to the same "such" it seems,
however they've completely different meanings in german (Suche = the
Search, Sucht => addicttion).
Is there a way to tune the stemmer or are there alternatives available
or should I look for another stemmer for the german language?
thanks for any pointers,
- Markus
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
*** XCOM AG Legal Disclaimer ***
Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist allein
für den Gebrauch durch den vorgesehenen Empfaenger bestimmt. Dritten ist
das Lesen, Verteilen oder Weiterleiten dieser E-Mail untersagt. Wir bitten,
eine fehlgeleitete E-Mail unverzueglich vollstaendig zu loeschen und uns
eine Nachricht zukommen zu lassen.
This email may contain material that is confidential and for the sole use
of the intended recipient. Any review, distribution by others or forwarding
without express permission is strictly prohibited. If you are not the
intended recipient, please contact the sender and delete all copies.
Hauptsitz: Bahnstrasse 33, D-47877 Willich, USt-IdNr.: DE 812 885 664
Kommunikation: Telefon +49 2154 9209-70, Telefax +49 2154 9209-900,
www.xcom.de
Handelsregister: Amtsgericht Krefeld, HRB 10340
Vorstand: Matthias Albrecht, Renate Becker-Grope, Marco Marty
Vorsitzender des Aufsichtsrates: Stephan Steuer