You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jian chen <ch...@gmail.com> on 2007/11/07 07:51:40 UTC
multiple crawl-urlfilter.txt files for different sites
Hi,
For crawling multiple sites, I am wondering if I can configure separate
crawl-urlfilter.txt for each site? It seems to me Nutch has just one
urlfilter file to configure, even if for multiple sites.
Please help.
Thanks,
Jian
Re: multiple crawl-urlfilter.txt files for different sites
Posted by Alvaro Cabrerizo <to...@gmail.com>.
Hi:
You can edit "bin/nutch" shell script and add a command line parameter (conf
dir). You have to ensure, that the conf_dir value you type is included in
the classpath.
Here are the main line you have to modify in the shell script:
# CLASSPATH initially contains $NUTCH_CONF_DIR, or defaults to
$NUTCH_HOME/conf
CLASSPATH=${NUTCH_CONF_DIR:=$NUTCH_HOME/conf}
Hope it helps.
2007/11/7, jian chen <ch...@gmail.com>:
>
> Hi,
>
> For crawling multiple sites, I am wondering if I can configure separate
> crawl-urlfilter.txt for each site? It seems to me Nutch has just one
> urlfilter file to configure, even if for multiple sites.
>
> Please help.
>
> Thanks,
>
> Jian
>