You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/04/09 03:12:56 UTC

[Nutch Wiki] Update of "CommandLineOptions" by ChiragChaman

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by ChiragChaman:
http://wiki.apache.org/nutch/CommandLineOptions

New page:

= Articles about Nutch =

Please add the newest articles on the top

2005-08-25 Nutch: Angriff auf Goolge, netzwoche (30), page 14, German

2004-08-17 '''Google dicht auf den Fersen: Freie Suchmaschinen''' silicon.de, German

2004-06-11 '''WOS3: Freie Suchmaschinen sollen der Monopolbildung entgegenwirken''' Heise Online, German

2004-06-10 '''Nutch: The Free Search Alternative to Google''' TELEPOLIS, Interview with Doug

2004-06-10 '''Nutch: die freie Suchalternative zu Google''' TELEPOLIS, Interview with Doug, German

2004-05-28 '''Google Blogoscoped''' interview

2004-04-02 '''Building Nutch: Open Source Search''' acmqueue

-- Main.DawidWeiss - 01 Dec 2004

* plugin name: Online Search Results Clustering using Carrot2's Lingo component
* plugin version: 0.9.0

* provider: Dawid Weiss, The Carrot2 project
* plugin home url: Included in Nutch CVS. Home WWW of the project: http://carrot2.sourceforge.net
* plugin download url: A binary is included in Nutch CVS. The plugin builds together with Nutch.
* license: BSD-style

* short description: Search results clustering plugin.
* long description: A plugin that clusters search results into groups of (related, hopefully) documents.
* configureable parameters: Take a look at the defaults defined in nutch-default.xml (search for 'clustering').
* meta data added to index: None. Clustering is performed dynamically for each result set.
* required jars: Many - the entire lib folder in the plugin must be present in classpath.
* plugin extension points:

* plugin extension point interface: net.nutch.clustering.OnlineClusterer
* plugin extension point xml snippet: ?


= Installation guide

* Create some index using the instructions provided in Nutch documentation,
* Deploy Nutch Web application and make sure the index is found and works (type a query and see if you
get any results).

* Stop Web container (Tomcat)
* You must modify =WEB-INF/classes/nutch-default.xml= file and include the clustering plugin (it is by default
ignored).

plugin.includes

protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)|clustering-carrot2
Regular expression naming plugin directory names to

include.  Any plugin not matching this expression is excluded.  By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins.

* Restart Tomcat.

* Reload the search page of Nutch. You should see the =clustering= checkbox next to =search= button.
Enable it and rerun your query. Clustered results should appear to the right.


= Command Line Options of bin/nutch =

See also inputs and outputs of different tools.

||'''command'''||'''function'''||
||bin/nutch admin options||database administration, including creation||
||bin/nutch inject options||inject new urls into the database||
||bin/nutch generate options||generate new segments to fetch||
||bin/nutch fetchlist||print the fetchlist of a segment||
||bin/nutch fetch options||fetch a segment's pages||
||bin/nutch index||run the indexer on a segment's fetcher output||
||bin/nutch merge options||merge several segment indexes||
||bin/nutch dedup||remove duplicates from a set of segment indexes||
||bin/nutch updatedb||update database from a segment's fetcher output||
||bin/nutch readdb options||examine arbitrary fields of the database||
||bin/nutch analyze||adjust database link-analysis scoring||
||bin/nutch server||run a search server||
||                          ||                                               ||||

-- MatthiasJaekle - 13 Mar 2004