You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Chaushu, Shani" <sh...@intel.com> on 2015/06/01 08:30:36 UTC

RE: Nutch 2.X vs. 1.X

Thanks !

-----Original Message-----
From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com] 
Sent: Sunday, May 31, 2015 21:56
To: user@nutch.apache.org
Subject: Re: Nutch 2.X vs. 1.X

Hi Chaushu,

On Sun, May 31, 2015 at 12:30 AM, <us...@nutch.apache.org> wrote:

>
> I'm using Nutch 1.9 with Solr 4.10
> I wanted to ask what are the advantages of Nutch 2 vs. Nutch 1 and if 
> I use Solr, there is a reason why should I use Nutch 2.
>

Nutch 1.X branch is the more maintained of the two Nutch codebases. It sees more community contributions and has seen more releases as of recent. Nutch 2.X should be used of you have a justified reason to access Nutch crawl data from one of the Gora supported datastores such as HBase. Both scale very well and work well on official Hadoop 1.X Hadoop distributions. Nutch 2.X works on Hadoop 2.X. I think we are still not quite a point where Nutch 1.X is fully supported on Hadoop 2.X.


> (I understand that the different is that Nutch 2 use NoSQL - but if I 
> use Solr, I can access the data from there..)
>
>
Correct. There is a gora-solr module where you can map your Nutch WebPage's and Web Graph (WebDB) to as well as your Host DB.
hth
Lewis
---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.