You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Joseph Naegele <jn...@grierforensics.com> on 2016/05/16 18:40:17 UTC

pros/cons of many nodes

Hi folks,

 

Would anyone be willing to share a few pros/cons of using many nodes vs. 1
very powerful machine for large-scale crawling? Of course many advantages
and disadvantages overlap with Hadoop and distributed computing in general,
but what I'm actually looking for are good reasons not to use a single
machine for Nutch.

 

One example could be that more machines give you more IP addresses for
fetching, and therefore less opportunity for being blocked by web admins,
correct?

 

Joe