You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Paul Tomblin <pt...@xcski.com> on 2009/07/29 10:33:12 UTC
Include/exclude lists
Is there any way other than the config files to specify the url filter
parameters? I have a few dozen sites to crawl, and for each site I
want to specify its own includes and excludes. I don't want to have
to go into the config file and change the
<property><name>urlfilter.regex.file</name> each time. Can I specify
that on the command line to bin/nutch generate or something?
--
http://www.linkedin.com/in/paultomblin
Re: Include/exclude lists
Posted by reinhard schwab <re...@aon.at>.
i would suggest that you implement an urlfilter plugin which is doing that.
which is mapping hosts to regexp rules.
Paul Tomblin schrieb:
> Is there any way other than the config files to specify the url filter
> parameters? I have a few dozen sites to crawl, and for each site I
> want to specify its own includes and excludes. I don't want to have
> to go into the config file and change the
> <property><name>urlfilter.regex.file</name> each time. Can I specify
> that on the command line to bin/nutch generate or something?
>
>