You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/11/25 18:09:04 UTC

[Hadoop Wiki] Update of "HadoopIsNot" by SteveLoughran

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "HadoopIsNot" page has been changed by SteveLoughran.
The comment on this change is: More details on what you need to know before you get started.
http://wiki.apache.org/hadoop/HadoopIsNot?action=diff&rev1=4&rev2=5

--------------------------------------------------

  
  == Hadoop clusters are not a place to learn Unix/Linux system administration ==
  
- You need to know your way round a Unix/Linux system. How to install it, what the various files in /etc/ are for, how to set up networking, what is a good hosts table, debug DNS problems, why to keep logs on a separate disk from the root disk, etc. If you cannot look after a single machine, you aren't going to be able to handle a cluster of 80 of them. That said, don't try maintaining those 80+ boxes using the same technique of hand-editing files lile [[/etc/hosts]], because it doesn't scale.
+ You need to know your way round a Unix/Linux system. How to install it, what the various files in /etc/ are for, how to set up networking, what is a good hosts table, how to debug DNS problems, why to keep logs on a separate disk from the root disk, etc. If you cannot look after a single machine, you aren't going to be able to handle a cluster of 80 of them. That said, don't try maintaining those 80+ boxes using the same technique of hand-editing files like [[/etc/hosts]], because it doesn't scale.
+ 
+ Things you need to know
+ 
+  * SSH, what it is, how to set up authorized_keys, how to use ssh and scp
+  * ifconfig, nslookup and other network config/diagnostics tools
+  * How your platform keeps itself up to date
+  * What the various log files your machine generates, and what they mean
+  * How to set up native filesystems and mount them
+ 
+ This is important. If you don't know these, you are out of your depth and should not start installing Hadoop until you have the basics of a couple of linux systems up and running, letting you ssh in to each of them without entering a password, know each other's hostname and such like. The Hadoop installation documents all assume you can do these things, and aren't going to bother explaining about them.
  
  == Hadoop Filesystem is not a substitute for a High Availability SAN-hosted FS ==