You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2013/06/07 05:49:49 UTC

[Hadoop Wiki] Trivial Update of "Virtual Hadoop" by LukeLu

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Virtual Hadoop" page has been changed by LukeLu:
https://wiki.apache.org/hadoop/Virtual%20Hadoop?action=diff&rev1=12&rev2=13

  
  Ignoring low-level networking/clock issues, what does this mean? (Only valid for some cloud vendors, it may be different for other cloud vendors or you own your virtualized infrastructure.)
  
-  1. When you request a VM, it's performance may vary from previous requests (when lack of isolation feature/policy). This can be due to CPU differences, or the other workloads.
+  1. When you request a VM, it's performance may vary from previous requests (when missing isolation feature/policy). This can be due to CPU differences, or the other workloads.
   1. There is no point writing topology scripts, if cloud vendor doesn't expose physical topology to you in some way. OTOH, [http://serengeti.cloudfoundry.com/ Project Serengeti] configures the topology script automatically for vSphere.
-  1. All network ports must be closed by way of firewall and routing information, apart from those ports critical for Hadoop -which must then run with security on.
+  1. All network ports must be closed by way of firewall and routing information, apart from those ports critical for Hadoop, which must then run with security on.
   1. All data you wish to keep must be kept on permanent storage: mounted block stores, remote filesystems or external databases. This goes for both input and output.
   1. People or programs need to track machine failures and react to them by releasing those machines and requesting new ones.
   1. If the cluster is idle. some machines can be decommissioned.
@@ -104, +104 @@

     * HDFS is as reliable and efficient as in physical.
     * Virtualization can provide much higher hardware utilization by consolidating multiple Hadoop clusters and other workload on the same physical cluster
     * Higher performance for some workload (including terasort) than physical for typical 2 CPU socket Hadoop nodes due to better NUMA and disk scheduling
-    * Per tenant VLAN via SDN for better security than typical shared physical Hadoop cluster 
+    * Per tenant VLAN (VXLAN) for better security than typical shared physical Hadoop cluster 
   * Given the choice between a virtual Hadoop and no Hadoop, virtual Hadoop is compelling.
   * Using Apache Hadoop as your MapReduce infrastructure gives you Cloud vendor independence, and the option of moving to a permanent physical deployment later.
   * It is the only way to execute the tools that work with Hadoop and the layers above it in a Cloud environment.
@@ -129, +129 @@

  
  == Summary ==
  
- You can bring up Hadoop in virtualized infrastructures. Sometimes it even makes sense for public cloud, for development and production. For production use, be aware that the differences between physical and virtual infrastructures could pose additional gotchas to your data integrity and security without proper planning and provisioning. 
+ You can bring up Hadoop in virtualized infrastructures with many benefits. Sometimes it even makes sense for public cloud, for development and production. For production use, be aware that the differences between physical and virtual infrastructures could pose additional gotchas to your data integrity and security without proper planning and provisioning.