You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by pepe3059 <pe...@gmail.com> on 2012/03/27 00:19:26 UTC

divide fetch process ?

Hello, i have some questions, sorry if i'm so noob 

Is there a way to divide "fetch process" between two or 
more computers using distinct internet conection? may 
be divide load from crawldb into segments and after doing 
a merge process whit them? is hadoop only for storage sharing?

i hope you can help me, i'm doing a crawling but it's too slow 
for one machine, any suggestion or tip is welcome, thank you

--
View this message in context: http://lucene.472066.n3.nabble.com/divide-fetch-process-tp3859625p3859625.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: divide fetch process ?

Posted by remi tassing <ta...@gmail.com>.

I think that is exactly what HADOOP does!
Start here: http://wiki.apache.org/nutch/NutchHadoopTutorial

On Tue, Mar 27, 2012 at 6:19 AM, pepe3059 <pe...@gmail.com> wrote:

> Hello, i have some questions, sorry if i'm so noob
>
> Is there a way to divide "fetch process" between two or
> more computers using distinct internet conection? may
> be divide load from crawldb into segments and after doing
> a merge process whit them? is hadoop only for storage sharing?
>
> i hope you can help me, i'm doing a crawling but it's too slow
> for one machine, any suggestion or tip is welcome, thank you
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/divide-fetch-process-tp3859625p3859625.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>