You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Manish Verma <m_...@apple.com> on 2016/01/12 01:19:46 UTC
Distributed Crawling
Hello Friends,
I am using nutch 1.10 and want to do distributed crawling for speed, Is this supported in Nutch 1.x or 2.x ?
Any document on this ?
Thanks Manish
Re: Distributed Crawling
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,
Nutch was designed as *distributed* crawler.
This tutorial should help:
https://wiki.apache.org/nutch/NutchHadoopTutorial
(it may be a little bit outdated, esp. for 1.11
which switched from Hadoop 1.2 to 2.4
-- we are grateful for any updates and completions.
Thanks!)
It's not easy to manage a Hadoop cluster
- you may first start to learn how to run
Nutch in pseudo-distributed mode:
http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial
- or run Nutch on a Hadoop cloud (e.g., on AWS)
There are many people sharing their experience out there,
just google for:
nutch distributed crawling
nutch aws
or have a look at Julien's recent video tutorial:
https://www.youtube.com/watch?v=v9zjcTjjjyU
Cheers,
Sebastian
On 01/12/2016 01:19 AM, Manish Verma wrote:
> Hello Friends,
>
> I am using nutch 1.10 and want to do distributed crawling for speed, Is this supported in Nutch 1.x or 2.x ?
> Any document on this ?
>
> Thanks Manish
>