You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by webdev1977 <we...@gmail.com> on 2011/08/15 14:59:20 UTC
Is running nutch in psuedo-distributed mode really worth it?
I have been looking at pros and cons of running nutch locally in
psuedo-distributed mode. I have a very large machine with lots of
processors and memory (16gb). I am not able to get more machines to setup a
proper hadoop cluster.
Is it worth the overhead to setup hadoop in pseduo distributed mode? Will I
see any gains in fetching large amounts of content from only three domains?
If it is worth it, can anyone point me to a good tutorial/post for setting
it up?
--
View this message in context: http://lucene.472066.n3.nabble.com/Is-running-nutch-in-psuedo-distributed-mode-really-worth-it-tp3255677p3255677.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Is running nutch in psuedo-distributed mode really worth it?
Posted by Markus Jelsma <ma...@openindex.io>.
Yes it should. 1.2 was the first not bundled with Hadoop. Should work out
fine.
On Thursday 18 August 2011 14:51:16 webdev1977 wrote:
> The tutorial that exists on the Nutch wiki is for versions < 1.3 Does it
> still generally apply to Nutch 1.3?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-running-nutch-in-psuedo-distributed-
> mode-really-worth-it-tp3255677p3264761.html Sent from the Nutch - User
> mailing list archive at Nabble.com.
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Re: Is running nutch in psuedo-distributed mode really worth it?
Posted by webdev1977 <we...@gmail.com>.
The tutorial that exists on the Nutch wiki is for versions < 1.3 Does it
still generally apply to Nutch 1.3?
--
View this message in context: http://lucene.472066.n3.nabble.com/Is-running-nutch-in-psuedo-distributed-mode-really-worth-it-tp3255677p3264761.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Is running nutch in psuedo-distributed mode really worth it?
Posted by Markus Jelsma <ma...@openindex.io>.
On Monday 15 August 2011 14:59:20 webdev1977 wrote:
> I have been looking at pros and cons of running nutch locally in
> psuedo-distributed mode. I have a very large machine with lots of
> processors and memory (16gb). I am not able to get more machines to setup
> a proper hadoop cluster.
>
> Is it worth the overhead to setup hadoop in pseduo distributed mode? Will I
> see any gains in fetching large amounts of content from only three domains?
You've many cores that you don't utilize right now which you can in pseudo-
mode. Fetching probably won't go faster since that's not a real bottleneck in
many cases. The slow jobs are parsing, updating the crawldb (if it is large)
or merging the linkdb (terrible performance).
>
> If it is worth it, can anyone point me to a good tutorial/post for setting
> it up?
Google hadoop nutch tutorial?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-running-nutch-in-psuedo-distributed-
> mode-really-worth-it-tp3255677p3255677.html Sent from the Nutch - User
> mailing list archive at Nabble.com.
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350