You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Manish Verma <m_...@apple.com> on 2016/01/12 01:19:46 UTC

Distributed Crawling

Hello Friends,

I am using nutch 1.10 and want to do distributed crawling for speed, Is this supported in Nutch 1.x or 2.x ?
Any document on this ?

Thanks Manish

Re: Distributed Crawling

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,

Nutch was designed as *distributed* crawler.

This tutorial should help:
 https://wiki.apache.org/nutch/NutchHadoopTutorial
(it may be a little bit outdated, esp. for 1.11
 which switched from Hadoop 1.2 to 2.4
 -- we are grateful for any updates and completions.
 Thanks!)

It's not easy to manage a Hadoop cluster
- you may first start to learn how to run
  Nutch in pseudo-distributed mode:
  http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial
- or run Nutch on a Hadoop cloud (e.g., on AWS)

There are many people sharing their experience out there,
just google for:
 nutch distributed crawling
 nutch aws
or have a look at Julien's recent video tutorial:
 https://www.youtube.com/watch?v=v9zjcTjjjyU

Cheers,
Sebastian

On 01/12/2016 01:19 AM, Manish Verma wrote:
> Hello Friends,
> 
> I am using nutch 1.10 and want to do distributed crawling for speed, Is this supported in Nutch 1.x or 2.x ?
> Any document on this ?
> 
> Thanks Manish
>