You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jian chen <ch...@gmail.com> on 2007/11/07 07:51:40 UTC

multiple crawl-urlfilter.txt files for different sites

Hi,

For crawling multiple sites, I am wondering if I can configure separate
crawl-urlfilter.txt for each site? It seems to me Nutch has just one
urlfilter file to configure, even if for multiple sites.

Please help.

Thanks,

Jian

Re: multiple crawl-urlfilter.txt files for different sites

Posted by Alvaro Cabrerizo <to...@gmail.com>.
Hi:

You can edit "bin/nutch" shell script and add a command line parameter (conf
dir). You have to ensure, that the conf_dir value you type is included in
the classpath.

Here are the main line you have to modify in the shell script:

# CLASSPATH initially contains $NUTCH_CONF_DIR, or defaults to
$NUTCH_HOME/conf
CLASSPATH=${NUTCH_CONF_DIR:=$NUTCH_HOME/conf}


Hope it helps.



2007/11/7, jian chen <ch...@gmail.com>:
>
> Hi,
>
> For crawling multiple sites, I am wondering if I can configure separate
> crawl-urlfilter.txt for each site? It seems to me Nutch has just one
> urlfilter file to configure, even if for multiple sites.
>
> Please help.
>
> Thanks,
>
> Jian
>