You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by 許懷文 <k1...@gmail.com> on 2012/12/24 13:18:58 UTC

About the version of the nutch

Dear Nutch Project Team:

I am interested in Nutch and Hadoop and want to use them to apply to  big
data analysis; but I have some problems with the version of them.
I want to set up a search engine by myself, and I also choose the
Hadoop+Nutch+Solr+Hbase to implement it.
Would you mind give me the suitable version of them to set them up? I will
appreciate your kind reply and helpful suggestions.
Thanks!
Best regards,
Kevin Hsu.

RE: About the version of the nutch

Posted by Markus Jelsma <ma...@openindex.io>.
Hi - it depends on the estimated size of your data and the available hardware. You can simply get the current 1.0.x stable or 1.1.x beta Hadoop version, both will run fine. The choice is which Nutch to use, 1.x is very stable and has more features and can be used for very large scale crawls although you might have to use a bit more hardware. 2.x is more efficient in writing and reading data but also less stable, you will run into more problems that divert you from your core tasks.

If you have a few powerful machines and your data is in the TB range 1.x is fine. If you like a challenge 2.x is the way to go. We process many TBs each month on just a few powerful machines and run a modified 1.x.  
 
-----Original message-----
> From:許懷文 <k1...@gmail.com>
> Sent: Mon 24-Dec-2012 18:17
> To: user@nutch.apache.org
> Subject: About the version of the nutch
> 
> Dear Nutch Project Team:
> 
> I am interested in Nutch and Hadoop and want to use them to apply to  big
> data analysis; but I have some problems with the version of them.
> I want to set up a search engine by myself, and I also choose the
> Hadoop+Nutch+Solr+Hbase to implement it.
> Would you mind give me the suitable version of them to set them up? I will
> appreciate your kind reply and helpful suggestions.
> Thanks!
> Best regards,
> Kevin Hsu.
> 

Re: About the version of the nutch

Posted by Tejas Patil <te...@gmail.com>.
http://wiki.apache.org/nutch/Nutch2Tutorial
http://techvineyard.blogspot.com/2010/12/build-nutch-20.html

The hadoop version to be used is not being mentioned in those pages. (If I
was in your place, I would have tried using Hadoop 0.20.2)

Thanks,
Tejas Patil


On Mon, Dec 24, 2012 at 4:18 AM, 許懷文 <k1...@gmail.com> wrote:

> Dear Nutch Project Team:
>
> I am interested in Nutch and Hadoop and want to use them to apply to  big
> data analysis; but I have some problems with the version of them.
> I want to set up a search engine by myself, and I also choose the
> Hadoop+Nutch+Solr+Hbase to implement it.
> Would you mind give me the suitable version of them to set them up? I will
> appreciate your kind reply and helpful suggestions.
> Thanks!
> Best regards,
> Kevin Hsu.
>