You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2013/03/20 22:35:39 UTC
[Nutch Wiki] Update of "bin/nutch solrindex" by kiranchitturi
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/nutch solrindex" page has been changed by kiranchitturi:
http://wiki.apache.org/nutch/bin/nutch%20solrindex?action=diff&rev1=5&rev2=6
This class replaces the legacy dependency for Nutch <1.3 to index to Apache Lucene for subsequent search. We now pass a SolrURL (amongst other arguements) to post data crawled by Nutch for search within an Apache Solr core.
Note: This class currently does commits once for all the reducers in one go. This is subject to change in subseqent versions of Nutch as a commit can take a lot of resources (cache warming) and it's not always necessary to commit after solrindex, solrdedup or solrclean, especially if they are run immediately after the other.
+
+ === Nutch 1.x ===
Usage:
{{{
@@ -28, +30 @@
'''[-filter]''': Enable URL filtering.
'''[-normalize]''': Enable URL normalizing.
+
+ === Nutch 2.x ===
+
+ {{{
+ Usage: SolrIndexerJob <solr url> (<batchId> | -all | -reindex) [-crawlId <id>]
+ }}}
CommandLineOptions