You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2006/05/19 23:14:52 UTC
[Nutch Wiki] Update of "Features" by KurosakaTeruhiko
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by KurosakaTeruhiko:
http://wiki.apache.org/nutch/Features
------------------------------------------------------------------------------
Missing from the current Nutch documentation (Tutorial, FAQ) is a list of features. This wiki page could help, if someone who knows the answers can edit it.
*What kind of searches does Nutch support? (quoted, nested, truncation, wildcarding [and where], Boolean),
+ * "...." (phrase search?), + (what is this for?), - (negation) and fieldname:term. No "AND" or "OR". The and-logic is imlied.
*Is stemming an option?
* According to the [http://www.lucenebook.com/ Lucene in Action] book: "Nutch does not use stemming or term aliasing of any kind. Search engines have not historically done much stemming, but it is a question that comes up regularly." -- page 329
*What kind of stemming does Nutch use? (and can you add exceptions/changes?)
* See previous answer :)
*Does Nutch support Boolean operators? (can you use Google-like plus or minus or are you stuck with 1990s terms?)
+ * No
*Does Nutch support weighted field searching, synonym support?
*What kinds of indexes does Nutch build? (multi-format indexing, incremental indexing, spell-check support, thesauri support, fielded searching, rank-by-reputation?)
*How does the search engine handle punctuation and special characters? (and what's configurable?)
+ * They are treated like a space.
*Which document formats are supported?
* Guessing from the names of the available parser plugins, this is probably it. However, only the plain text and HTML are enabled by default. Edit conf/nutch-site.xml and change the value of plugin.includes property to include the plugins for the document types that you want Nutch to handle:
* Plain Text (plugin: parse-text)