You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Paul Tomblin <pt...@xcski.com> on 2009/07/29 10:33:12 UTC

Include/exclude lists

Is there any way other than the config files to specify the url filter
parameters?  I have a few dozen sites to crawl, and for each site I
want to specify its own includes and excludes.  I don't want to have
to go into the config file and change the
<property><name>urlfilter.regex.file</name> each time.  Can I specify
that on the command line to bin/nutch generate or something?

-- 
http://www.linkedin.com/in/paultomblin

Re: Include/exclude lists

Posted by reinhard schwab <re...@aon.at>.
i would suggest that you implement an urlfilter plugin which is doing that.
which is mapping hosts to regexp rules.

Paul Tomblin schrieb:
> Is there any way other than the config files to specify the url filter
> parameters?  I have a few dozen sites to crawl, and for each site I
> want to specify its own includes and excludes.  I don't want to have
> to go into the config file and change the
> <property><name>urlfilter.regex.file</name> each time.  Can I specify
> that on the command line to bin/nutch generate or something?
>
>