You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by webdev1977 <we...@gmail.com> on 2011/08/15 14:59:20 UTC

Is running nutch in psuedo-distributed mode really worth it?

I have been looking at pros and cons of running nutch locally in
psuedo-distributed mode.  I have a very large machine with lots of
processors and memory (16gb).  I am not able to get more machines to setup a
proper hadoop cluster.   

Is it worth the overhead to setup hadoop in pseduo distributed mode? Will I
see any gains in fetching large amounts of content from only three domains? 

If it is worth it, can anyone point me to a good tutorial/post for setting
it up?



--
View this message in context: http://lucene.472066.n3.nabble.com/Is-running-nutch-in-psuedo-distributed-mode-really-worth-it-tp3255677p3255677.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Is running nutch in psuedo-distributed mode really worth it?

Posted by Markus Jelsma <ma...@openindex.io>.
Yes it should. 1.2 was the first not bundled with Hadoop. Should work out 
fine.


On Thursday 18 August 2011 14:51:16 webdev1977 wrote:
> The tutorial that exists on the Nutch wiki is for versions < 1.3  Does it
> still generally apply to Nutch 1.3?
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-running-nutch-in-psuedo-distributed-
> mode-really-worth-it-tp3255677p3264761.html Sent from the Nutch - User
> mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Is running nutch in psuedo-distributed mode really worth it?

Posted by webdev1977 <we...@gmail.com>.
The tutorial that exists on the Nutch wiki is for versions < 1.3  Does it
still generally apply to Nutch 1.3?

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-running-nutch-in-psuedo-distributed-mode-really-worth-it-tp3255677p3264761.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Is running nutch in psuedo-distributed mode really worth it?

Posted by Markus Jelsma <ma...@openindex.io>.

On Monday 15 August 2011 14:59:20 webdev1977 wrote:
> I have been looking at pros and cons of running nutch locally in
> psuedo-distributed mode.  I have a very large machine with lots of
> processors and memory (16gb).  I am not able to get more machines to setup
> a proper hadoop cluster.
> 
> Is it worth the overhead to setup hadoop in pseduo distributed mode? Will I
> see any gains in fetching large amounts of content from only three domains?

You've many cores that you don't utilize right now which you can in pseudo-
mode. Fetching probably won't go faster since that's not a real bottleneck in 
many cases. The slow jobs are parsing, updating the crawldb (if it is large) 
or merging the linkdb (terrible performance).

> 
> If it is worth it, can anyone point me to a good tutorial/post for setting
> it up?

Google hadoop nutch tutorial?

> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-running-nutch-in-psuedo-distributed-
> mode-really-worth-it-tp3255677p3255677.html Sent from the Nutch - User
> mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350