You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by Tejas Patil <te...@gmail.com> on 2014/01/23 21:11:15 UTC

Re: Right way to run crawl script in deploy mode

Correction: the subject of this message should have read:
"Right way to run crawl script in deploy mode"

~tejas

On Wed, Jan 22, 2014 at 7:56 PM, Tejas Patil <te...@gmail.com>wrote:

> Hi nutch-dev,
>
> I was assuming that the commands to run the bin/crawl script in both local
> and deploy mode are the same.
> ie. from $NUTCH_HOME/runtime/local (or runtime/deploy),  use
> > bin/crawl <seedDir> <crawlDir> <solrURL> <numberOfRounds>
>
> It turns out that in deploy mode, this does not obtain the segment
> location from HDFS and runs into problems. The reason being this code
> snippet in the crawl script: it tries to locate the job file in the parent
> directory and fails (note that I am running from runtime/deploy):
>
> mode=local
> if [ -f ../*nutch-*.job ]; then
>     mode=distributed
> fi
>
> When ran from runtime/deploy/bin, it runs properly.
> Shouldn't the command be consistent with that of local mode ?
>
> Thanks,
> Tejas
>
>