You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Coffey <mc...@yahoo.com.INVALID> on 2016/10/30 17:19:35 UTC

Re: Nutch 1.x or 2.x

Newbie question: I am trying to decide between Nutch 1.x or 2.x. The application is to crawl a large portion of the www using a massive number (thousands) of small machines (<= 2GB RAM each). I like the idea of the simpler architecture and pluggable storage backend of 2.x. However, I am concerned about things I've read about 2.x being less stable and possibly less efficient than 1.x. Are these concerns valid at this time?




   

Re: Nutch 1.x or 2.x

Posted by Furkan KAMACI <fu...@gmail.com>.
Hi Michael,

Concerns are related to Gore as like here:
https://www.quora.com/Compared-to-Nutch-2-x-why-does-Nutch-1-x-have-a-better-performance
I think you also saw the comparison of Nutch 1.7 and Nutch 2.2.1:
http://digitalpebble.blogspot.com.tr/2013/09/nutch-fight-17-vs-221.html

However GORA getting better as like the mentioned problem is solved at that
blog post: https://issues.apache.org/jira/browse/GORA-119

I've used Nutch 2.x for a large scale crawling and everything was fine.
However servers had much more memory than 2 GB. So, I think that you should
run a test and try it yourself due to you have very limited memory.

Kind Regards,
Furkan KAMACI

On Sun, Oct 30, 2016 at 7:19 PM, Michael Coffey <mc...@yahoo.com.invalid>
wrote:

> Newbie question: I am trying to decide between Nutch 1.x or 2.x. The
> application is to crawl a large portion of the www using a massive number
> (thousands) of small machines (<= 2GB RAM each). I like the idea of the
> simpler architecture and pluggable storage backend of 2.x. However, I am
> concerned about things I've read about 2.x being less stable and possibly
> less efficient than 1.x. Are these concerns valid at this time?
>
>
>
>
>