You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/04/09 03:14:59 UTC

[Nutch Wiki] Update of "German" by ChiragChaman

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by ChiragChaman:
http://wiki.apache.org/nutch/German

New page:

= Plugin: German =

The plugin enables German-language stemming during indexing and searching. Unnecessary German stop words are removed from content and query.

The package contains:

  * A German BasicIndexingFilter to replace the standard BasicIndexingFilter?.
  * A German BasicQueryFilter to replace the standard BasicQueryFilter?.
  * A stop-list "german-stopword.txt" used by both.

Download at http://nutch.eventax.com/

== Config File Options ==

== german.stopword.file ==

Default filename: german-stopword.txt

german-stopword.txt has to be placed into CLASSPATH/conf directory.

Syntax:
#List of stopwords:
der
die
das
and
a
...

== Internal Documentation ==

The German Analyzer from the Lucene package is used.

The GermanBasicIndexingFilter works approximately 10

== Searching ==

It is possible to use stop words in the query. They are ignored, but emphasized like normal hits.

-- HammoudaBouyedda - 28 Sep 2004