You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/04/09 03:14:59 UTC
[Nutch Wiki] Update of "German" by ChiragChaman
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by ChiragChaman:
http://wiki.apache.org/nutch/German
New page:
= Plugin: German =
The plugin enables German-language stemming during indexing and searching. Unnecessary German stop words are removed from content and query.
The package contains:
* A German BasicIndexingFilter to replace the standard BasicIndexingFilter?.
* A German BasicQueryFilter to replace the standard BasicQueryFilter?.
* A stop-list "german-stopword.txt" used by both.
Download at http://nutch.eventax.com/
== Config File Options ==
== german.stopword.file ==
Default filename: german-stopword.txt
german-stopword.txt has to be placed into CLASSPATH/conf directory.
Syntax:
#List of stopwords:
der
die
das
and
a
...
== Internal Documentation ==
The German Analyzer from the Lucene package is used.
The GermanBasicIndexingFilter works approximately 10
== Searching ==
It is possible to use stop words in the query. They are ignored, but emphasized like normal hits.
-- HammoudaBouyedda - 28 Sep 2004