You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by co...@complexityintelligence.com on 2012/01/09 12:30:01 UTC

Multiple nutch setup

Hello,

   If a want to crawl a set A of pages, and a set B of pages,
but using a config(A) for A, and a config(B) for B, which is
the suggested 'best strategy' ?

   In my mind:

      $NUTCH_HOME/runtime -> Keep it as a 'vanilla' reference
      $NUTCH_HOME/runtime_A -> A 'clone' of 'vanilla' with a custom
setup for set A
      $NUTCH_HOME/runtime_B -> B 'clone' of 'vanilla' with a custom
setup for set B

   Any way to do better ? Is my setup reccomended or not ?

   A simple multiple installation seems overkilling (multiple
$NUTCH_HOME), and I
think it can be preferred only is config(A) and config(B) have to use
different
nutch versions.

Alessio


Re: Multiple nutch setup

Posted by Markus Jelsma <ma...@openindex.io>.
When running locally i simply use different NUTCH_CONF_DIR and NUTCH_LOG_DIR 
and point to segments in different paths.

On Monday 09 January 2012 12:30:01 contacts@complexityintelligence.com wrote:
> Hello,
> 
>    If a want to crawl a set A of pages, and a set B of pages,
> but using a config(A) for A, and a config(B) for B, which is
> the suggested 'best strategy' ?
> 
>    In my mind:
> 
>       $NUTCH_HOME/runtime -> Keep it as a 'vanilla' reference
>       $NUTCH_HOME/runtime_A -> A 'clone' of 'vanilla' with a custom
> setup for set A
>       $NUTCH_HOME/runtime_B -> B 'clone' of 'vanilla' with a custom
> setup for set B
> 
>    Any way to do better ? Is my setup reccomended or not ?
> 
>    A simple multiple installation seems overkilling (multiple
> $NUTCH_HOME), and I
> think it can be preferred only is config(A) and config(B) have to use
> different
> nutch versions.
> 
> Alessio

-- 
Markus Jelsma - CTO - Openindex