You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alexander Aristov <al...@gmail.com> on 2009/04/21 14:21:58 UTC
running two crawlers at the same time
Hi all
I want to run two crawlers using single server at the same time across
different seed lists.
The question is
Is it safe to use one binaries? I have developed scripts to specify
different input/output locations but I wonder if nutch creates some
temporarily folders during its work which I cannot control and so it would
be possible situation when two crawlers overlap working data.
Thanks
Alexander Aristov
Re: running two crawlers at the same time
Posted by Dennis Kubes <ku...@apache.org>.
Alexander Aristov wrote:
> Hi all
>
> I want to run two crawlers using single server at the same time across
> different seed lists.
>
> The question is
>
> Is it safe to use one binaries? I have developed scripts to specify
> different input/output locations but I wonder if nutch creates some
> temporarily folders during its work which I cannot control and so it would
> be possible situation when two crawlers overlap working data.
There aren't any conflicts in having multiple crawling jobs going and
outputting to different directories at the same time. You do need to be
careful about ordering if you are generating the crawl lists from a
single crawldb and then updating back into that crawldb.
Dennis
>
> Thanks
> Alexander Aristov
>
Re: running two crawlers at the same time
Posted by Alex Basa <al...@yahoo.com>.
It's not a problem. I've done it with up to 30 at a time on a single blade server, each using different lists and outputting to different directories. I didn't see any cross pollination happening.
--- On Tue, 4/21/09, Alexander Aristov <al...@gmail.com> wrote:
> From: Alexander Aristov <al...@gmail.com>
> Subject: running two crawlers at the same time
> To: nutch-user@lucene.apache.org
> Date: Tuesday, April 21, 2009, 7:21 AM
> Hi all
>
> I want to run two crawlers using single server at the same
> time across
> different seed lists.
>
> The question is
>
> Is it safe to use one binaries? I have developed scripts to
> specify
> different input/output locations but I wonder if nutch
> creates some
> temporarily folders during its work which I cannot control
> and so it would
> be possible situation when two crawlers overlap working
> data.
>
> Thanks
> Alexander Aristov