You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/11/25 18:09:04 UTC
[Hadoop Wiki] Update of "HadoopIsNot" by SteveLoughran
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "HadoopIsNot" page has been changed by SteveLoughran.
The comment on this change is: More details on what you need to know before you get started.
http://wiki.apache.org/hadoop/HadoopIsNot?action=diff&rev1=4&rev2=5
--------------------------------------------------
== Hadoop clusters are not a place to learn Unix/Linux system administration ==
- You need to know your way round a Unix/Linux system. How to install it, what the various files in /etc/ are for, how to set up networking, what is a good hosts table, debug DNS problems, why to keep logs on a separate disk from the root disk, etc. If you cannot look after a single machine, you aren't going to be able to handle a cluster of 80 of them. That said, don't try maintaining those 80+ boxes using the same technique of hand-editing files lile [[/etc/hosts]], because it doesn't scale.
+ You need to know your way round a Unix/Linux system. How to install it, what the various files in /etc/ are for, how to set up networking, what is a good hosts table, how to debug DNS problems, why to keep logs on a separate disk from the root disk, etc. If you cannot look after a single machine, you aren't going to be able to handle a cluster of 80 of them. That said, don't try maintaining those 80+ boxes using the same technique of hand-editing files like [[/etc/hosts]], because it doesn't scale.
+
+ Things you need to know
+
+ * SSH, what it is, how to set up authorized_keys, how to use ssh and scp
+ * ifconfig, nslookup and other network config/diagnostics tools
+ * How your platform keeps itself up to date
+ * What the various log files your machine generates, and what they mean
+ * How to set up native filesystems and mount them
+
+ This is important. If you don't know these, you are out of your depth and should not start installing Hadoop until you have the basics of a couple of linux systems up and running, letting you ssh in to each of them without entering a password, know each other's hostname and such like. The Hadoop installation documents all assume you can do these things, and aren't going to bother explaining about them.
== Hadoop Filesystem is not a substitute for a High Availability SAN-hosted FS ==