You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nayanish Hinge <na...@gmail.com> on 2010/08/27 12:49:35 UTC
Nutch Custom Url Partitioner to create equal division of seed across
slave hosts
Hi,
I have asked this question in another forum already but did not get any
answer yet.
http://stackoverflow.com/questions/3575441/nutch-custom-url-partitioner
Could somebody shed some light?
---------
Hi, I am writing custom search task using nutch for intranet crawl. I am
using Hadoop for it. I want to spawn the task across multiple hadoop slaves
by dividing the seed urls evenly. I guess this job is taken care by the
partitioner.
I see the default implementation of Nutch UrlPartitioner partitions url by
Host, Domain or IP. I want to override that behavior and simply divide the
seeds equally based on number of maxthreads I pass on command line.
Could I do that with simple config changes other than re-writing the
Partioner?
*EDIT*
Custom search task is being written by re-writing Crawl.java
----------
Nayanish