You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by pw...@apache.org on 2013/09/25 02:14:46 UTC
svn commit: r2978 [4/12] - in /dev/incubator/spark/spark-0.8.0-incubating-rc6-docs: ./ css/ img/ js/ js/vendor/

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/css/bootstrap.min.css
==============================================================================
--- dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/css/bootstrap.min.css (added)
+++ dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/css/bootstrap.min.css Wed Sep 25 00:14:43 2013
@@ -0,0 +1,9 @@
+/*!
+ * Bootstrap v2.1.0
+ *
+ * Copyright 2012 Twitter, Inc
+ * Licensed under the Apache License v2.0
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Designed and built with all the love in the world @twitter by @mdo and @fat.

[... 2 lines stripped ...]
Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/css/main.css
==============================================================================
--- dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/css/main.css (added)
+++ dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/css/main.css Wed Sep 25 00:14:43 2013
@@ -0,0 +1,106 @@
+/* ==========================================================================
+   Author's custom styles
+   ========================================================================== */
+
+.navbar .brand {
+  height: 50px;
+  width: 110px;
+  margin-left: 1px;
+  padding: 0;
+}
+
+.version {
+  line-height: 30px;
+  vertical-align: bottom;
+  font-size: 12px;
+  padding: 0;
+  margin: 0;
+  font-weight: bold;
+  color: #777;
+}
+
+.navbar-inner {
+  padding-top: 2px;
+  height: 50px;
+}
+
+.navbar-inner .nav {
+  margin-top: 5px;
+  font-size: 15px;
+}
+
+.navbar .divider-vertical {
+  border-right-color: lightgray;
+}
+
+.navbar-text .version-text {
+  color: #555555;
+  padding: 5px;
+  margin-left: 10px;
+}
+
+body #content {
+  line-height: 1.6; /* Inspired by Github's wiki style */
+}
+
+.title {
+  font-size: 32px;
+}
+
+h1 {
+  font-size: 28px;
+  margin-top: 12px;
+}
+
+h2 {
+  font-size: 24px;
+  margin-top: 12px;
+}
+
+h3 {
+  font-size: 21px;
+  margin-top: 10px;
+}
+
+pre {
+  font-family: "Menlo", "Lucida Console", monospace;
+}
+
+code {
+  font-family: "Menlo", "Lucida Console", monospace;
+  background: white;
+  border: none;
+  padding: 0;
+  color: #444444;
+}
+
+a code {
+  color: #0088cc;
+}
+
+a:hover code {
+  color: #005580;
+  text-decoration: underline;
+}
+
+.container {
+  max-width: 914px;
+}
+
+/**
+ * Make dropdown menus in nav bars show on hover instead of click
+ * using solution at http://stackoverflow.com/questions/8878033/how-
+ * to-make-twitter-bootstrap-menu-dropdown-on-hover-rather-than-click
+ **/
+.dropdown-menu {
+  /* Remove the default 2px top margin which causes a small
+    gap between the hover trigger area and the popup menu */
+  margin-top: 0;
+}
+ul.nav li.dropdown:hover ul.dropdown-menu{
+  display: block;
+}
+a.menu:after, .dropdown-toggle:after {
+  content: none;
+}
+

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/css/main.css
------------------------------------------------------------------------------
    svn:executable = *

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/css/pygments-default.css
==============================================================================
--- dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/css/pygments-default.css (added)
+++ dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/css/pygments-default.css Wed Sep 25 00:14:43 2013
@@ -0,0 +1,76 @@
+/*
+Documentation for pygments (and Jekyll for that matter) is super sparse.
+To generate this, I had to run
+  `pygmentize -S default -f html > pygments-default.css`
+But first I had to install pygments via easy_install pygments
+
+I had to override the conflicting bootstrap style rules by linking to
+this stylesheet lower in the html than the bootstap css.
+
+Also, I was thrown off for a while at first when I was using markdown
+code block inside my {% highlight scala %} ... {% endhighlight %} tags
+(I was using 4 spaces for this), when it turns out that pygments will
+insert the code (or pre?) tags for you.
+*/
+
+.hll { background-color: #ffffcc }
+.c { color: #60a0b0; font-style: italic } /* Comment */
+.err { } /* Error */
+.k { color: #007020; font-weight: bold } /* Keyword */
+.o { color: #666666 } /* Operator */
+.cm { color: #60a0b0; font-style: italic } /* Comment.Multiline */
+.cp { color: #007020 } /* Comment.Preproc */
+.c1 { color: #60a0b0; font-style: italic } /* Comment.Single */
+.cs { color: #60a0b0; background-color: #fff0f0 } /* Comment.Special */
+.gd { color: #A00000 } /* Generic.Deleted */
+.ge { font-style: italic } /* Generic.Emph */
+.gr { color: #FF0000 } /* Generic.Error */
+.gh { color: #000080; font-weight: bold } /* Generic.Heading */
+.gi { color: #00A000 } /* Generic.Inserted */
+.go { color: #808080 } /* Generic.Output */
+.gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */
+.gs { font-weight: bold } /* Generic.Strong */
+.gu { color: #800080; font-weight: bold } /* Generic.Subheading */
+.gt { color: #0040D0 } /* Generic.Traceback */
+.kc { color: #007020; font-weight: bold } /* Keyword.Constant */
+.kd { color: #007020; font-weight: bold } /* Keyword.Declaration */
+.kn { color: #007020; font-weight: bold } /* Keyword.Namespace */
+.kp { color: #007020 } /* Keyword.Pseudo */
+.kr { color: #007020; font-weight: bold } /* Keyword.Reserved */
+.kt { color: #902000 } /* Keyword.Type */
+.m { color: #40a070 } /* Literal.Number */
+.s { color: #4070a0 } /* Literal.String */
+.na { color: #4070a0 } /* Name.Attribute */
+.nb { color: #007020 } /* Name.Builtin */
+.nc { color: #0e84b5; font-weight: bold } /* Name.Class */
+.no { color: #60add5 } /* Name.Constant */
+.nd { color: #555555; font-weight: bold } /* Name.Decorator */
+.ni { color: #d55537; font-weight: bold } /* Name.Entity */
+.ne { color: #007020 } /* Name.Exception */
+.nf { color: #06287e } /* Name.Function */
+.nl { color: #002070; font-weight: bold } /* Name.Label */
+.nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */
+.nt { color: #062873; font-weight: bold } /* Name.Tag */
+.nv { color: #bb60d5 } /* Name.Variable */
+.ow { color: #007020; font-weight: bold } /* Operator.Word */
+.w { color: #bbbbbb } /* Text.Whitespace */
+.mf { color: #40a070 } /* Literal.Number.Float */
+.mh { color: #40a070 } /* Literal.Number.Hex */
+.mi { color: #40a070 } /* Literal.Number.Integer */
+.mo { color: #40a070 } /* Literal.Number.Oct */
+.sb { color: #4070a0 } /* Literal.String.Backtick */
+.sc { color: #4070a0 } /* Literal.String.Char */
+.sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */
+.s2 { color: #4070a0 } /* Literal.String.Double */
+.se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */
+.sh { color: #4070a0 } /* Literal.String.Heredoc */
+.si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */
+.sx { color: #c65d09 } /* Literal.String.Other */
+.sr { color: #235388 } /* Literal.String.Regex */
+.s1 { color: #4070a0 } /* Literal.String.Single */
+.ss { color: #517918 } /* Literal.String.Symbol */
+.bp { color: #007020 } /* Name.Builtin.Pseudo */
+.vc { color: #bb60d5 } /* Name.Variable.Class */
+.vg { color: #bb60d5 } /* Name.Variable.Global */
+.vi { color: #bb60d5 } /* Name.Variable.Instance */
+.il { color: #40a070 } /* Literal.Number.Integer.Long */
\ No newline at end of file

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/ec2-scripts.html
==============================================================================
--- dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/ec2-scripts.html (added)
+++ dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/ec2-scripts.html Wed Sep 25 00:14:43 2013
@@ -0,0 +1,352 @@
+<!DOCTYPE html>
+<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
+<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
+<!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
+    <head>
+        <meta charset="utf-8">
+        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
+        <title>Running Spark on EC2 - Spark 0.8.0 Documentation</title>
+        <meta name="description" content="">
+
+        <link rel="stylesheet" href="css/bootstrap.min.css">
+        <style>
+            body {
+                padding-top: 60px;
+                padding-bottom: 40px;
+            }
+        </style>
+        <meta name="viewport" content="width=device-width">
+        <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
+        <link rel="stylesheet" href="css/main.css">
+
+        <script src="js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
+        
+        <link rel="stylesheet" href="css/pygments-default.css">
+
+        <!-- Google analytics script -->
+        <script type="text/javascript">
+          /*
+          var _gaq = _gaq || [];
+          _gaq.push(['_setAccount', 'UA-32518208-1']);
+          _gaq.push(['_trackPageview']);
+
+          (function() {
+            var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+            ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+            var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+          })();
+          */
+        </script>
+
+    </head>
+    <body>
+        <!--[if lt IE 7]>
+            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
+        <![endif]-->
+
+        <!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html -->
+
+        <div class="navbar navbar-fixed-top" id="topbar">
+            <div class="navbar-inner">
+                <div class="container">
+                    <div class="brand"><a href="index.html">
+                      <img src="img/spark-logo-hd.png" style="height:50px;"/></a><span class="version">0.8.0</span>
+                    </div>
+                    <ul class="nav">
+                        <!--TODO(andyk): Add class="active" attribute to li some how.-->
+                        <li><a href="index.html">Overview</a></li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Programming Guides<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="quick-start.html">Quick Start</a></li>
+                                <li><a href="scala-programming-guide.html">Spark in Scala</a></li>
+                                <li><a href="java-programming-guide.html">Spark in Java</a></li>
+                                <li><a href="python-programming-guide.html">Spark in Python</a></li>
+                                <li class="divider"></li>
+                                <li><a href="streaming-programming-guide.html">Spark Streaming</a></li>
+                                <li><a href="mllib-guide.html">MLlib (Machine Learning)</a></li>
+                                <li><a href="bagel-programming-guide.html">Bagel (Pregel on Spark)</a></li>
+                            </ul>
+                        </li>
+                        
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">API Docs<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="api/core/index.html">Spark Core for Java/Scala</a></li>
+                                <li><a href="api/pyspark/index.html">Spark Core for Python</a></li>
+                                <li class="divider"></li>
+                                <li><a href="api/streaming/index.html">Spark Streaming</a></li>
+                                <li><a href="api/mllib/index.html">MLlib (Machine Learning)</a></li>
+                                <li><a href="api/bagel/index.html">Bagel (Pregel on Spark)</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Deploying<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="cluster-overview.html">Overview</a></li>
+                                <li><a href="ec2-scripts.html">Amazon EC2</a></li>
+                                <li><a href="spark-standalone.html">Standalone Mode</a></li>
+                                <li><a href="running-on-mesos.html">Mesos</a></li>
+                                <li><a href="running-on-yarn.html">YARN</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="api.html" class="dropdown-toggle" data-toggle="dropdown">More<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="configuration.html">Configuration</a></li>
+                                <li><a href="monitoring.html">Monitoring</a></li>
+                                <li><a href="tuning.html">Tuning Guide</a></li>
+                                <li><a href="hadoop-third-party-distributions.html">Running with CDH/HDP</a></li>
+                                <li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
+                                <li><a href="job-scheduling.html">Job Scheduling</a></li>
+                                <li class="divider"></li>
+                                <li><a href="building-with-maven.html">Building Spark with Maven</a></li>
+                                <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark</a></li>
+                            </ul>
+                        </li>
+                    </ul>
+                    <!--<p class="navbar-text pull-right"><span class="version-text">v0.8.0</span></p>-->
+                </div>
+            </div>
+        </div>
+
+        <div class="container" id="content">
+          <h1 class="title">Running Spark on EC2</h1>
+
+          <p>The <code>spark-ec2</code> script, located in Spark&#8217;s <code>ec2</code> directory, allows you
+to launch, manage and shut down Spark clusters on Amazon EC2. It automatically
+sets up Spark, Shark and HDFS on the cluster for you. This guide describes 
+how to use <code>spark-ec2</code> to launch clusters, how to run jobs on them, and how 
+to shut them down. It assumes you&#8217;ve already signed up for an EC2 account 
+on the <a href="http://aws.amazon.com/">Amazon Web Services site</a>.</p>
+
+<p><code>spark-ec2</code> is designed to manage multiple named clusters. You can
+launch a new cluster (telling the script its size and giving it a name),
+shutdown an existing cluster, or log into a cluster. Each cluster is
+identified by placing its machines into EC2 security groups whose names
+are derived from the name of the cluster. For example, a cluster named
+<code>test</code> will contain a master node in a security group called
+<code>test-master</code>, and a number of slave nodes in a security group called
+<code>test-slaves</code>. The <code>spark-ec2</code> script will create these security groups
+for you based on the cluster name you request. You can also use them to
+identify machines belonging to each cluster in the Amazon EC2 Console.</p>
+
+<h1 id="before-you-start">Before You Start</h1>
+
+<ul>
+  <li>Create an Amazon EC2 key pair for yourself. This can be done by
+logging into your Amazon Web Services account through the <a href="http://aws.amazon.com/console/">AWS
+console</a>, clicking Key Pairs on the
+left sidebar, and creating and downloading a key. Make sure that you
+set the permissions for the private key file to <code>600</code> (i.e. only you
+can read and write it) so that <code>ssh</code> will work.</li>
+  <li>Whenever you want to use the <code>spark-ec2</code> script, set the environment
+variables <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> to your
+Amazon EC2 access key ID and secret access key. These can be
+obtained from the <a href="http://aws.amazon.com/">AWS homepage</a> by clicking
+Account &gt; Security Credentials &gt; Access Credentials.</li>
+</ul>
+
+<h1 id="launching-a-cluster">Launching a Cluster</h1>
+
+<ul>
+  <li>Go into the <code>ec2</code> directory in the release of Spark you downloaded.</li>
+  <li>Run
+<code>./spark-ec2 -k &lt;keypair&gt; -i &lt;key-file&gt; -s &lt;num-slaves&gt; launch &lt;cluster-name&gt;</code>,
+where <code>&lt;keypair&gt;</code> is the name of your EC2 key pair (that you gave it
+when you created it), <code>&lt;key-file&gt;</code> is the private key file for your
+key pair, <code>&lt;num-slaves&gt;</code> is the number of slave nodes to launch (try
+1 at first), and <code>&lt;cluster-name&gt;</code> is the name to give to your
+cluster.</li>
+  <li>After everything launches, check that the cluster scheduler is up and sees
+all the slaves by going to its web UI, which will be printed at the end of
+the script (typically <code>http://&lt;master-hostname&gt;:8080</code>).</li>
+</ul>
+
+<p>You can also run <code>./spark-ec2 --help</code> to see more usage options. The
+following options are worth pointing out:</p>
+
+<ul>
+  <li><code>--instance-type=&lt;INSTANCE_TYPE&gt;</code> can be used to specify an EC2
+instance type to use. For now, the script only supports 64-bit instance
+types, and the default type is <code>m1.large</code> (which has 2 cores and 7.5 GB
+RAM). Refer to the Amazon pages about <a href="http://aws.amazon.com/ec2/instance-types">EC2 instance
+types</a> and <a href="http://aws.amazon.com/ec2/#pricing">EC2
+pricing</a> for information about other
+instance types. </li>
+  <li><code>--region=&lt;EC2_REGION&gt;</code> specifies an EC2 region in which to launch
+instances. The default region is <code>us-east-1</code>.</li>
+  <li><code>--zone=&lt;EC2_ZONE&gt;</code> can be used to specify an EC2 availability zone
+to launch instances in. Sometimes, you will get an error because there
+is not enough capacity in one zone, and you should try to launch in
+another.</li>
+  <li><code>--ebs-vol-size=GB</code> will attach an EBS volume with a given amount
+of space to each node so that you can have a persistent HDFS cluster
+on your nodes across cluster restarts (see below).</li>
+  <li><code>--spot-price=PRICE</code> will launch the worker nodes as
+<a href="http://aws.amazon.com/ec2/spot-instances/">Spot Instances</a>,
+bidding for the given maximum price (in dollars).</li>
+  <li><code>--spark-version=VERSION</code> will pre-load the cluster with the
+specified version of Spark. VERSION can be a version number
+(e.g. &#8220;0.7.3&#8221;) or a specific git hash. By default, a recent
+version will be used.</li>
+  <li>If one of your launches fails due to e.g. not having the right
+permissions on your private key file, you can run <code>launch</code> with the
+<code>--resume</code> option to restart the setup process on an existing cluster.</li>
+</ul>
+
+<h1 id="running-applications">Running Applications</h1>
+
+<ul>
+  <li>Go into the <code>ec2</code> directory in the release of Spark you downloaded.</li>
+  <li>Run <code>./spark-ec2 -k &lt;keypair&gt; -i &lt;key-file&gt; login &lt;cluster-name&gt;</code> to
+SSH into the cluster, where <code>&lt;keypair&gt;</code> and <code>&lt;key-file&gt;</code> are as
+above. (This is just for convenience; you could also use
+the EC2 console.)</li>
+  <li>To deploy code or data within your cluster, you can log in and use the
+provided script <code>~/spark-ec2/copy-dir</code>, which,
+given a directory path, RSYNCs it to the same location on all the slaves.</li>
+  <li>If your application needs to access large datasets, the fastest way to do
+that is to load them from Amazon S3 or an Amazon EBS device into an
+instance of the Hadoop Distributed File System (HDFS) on your nodes.
+The <code>spark-ec2</code> script already sets up a HDFS instance for you. It&#8217;s
+installed in <code>/root/ephemeral-hdfs</code>, and can be accessed using the
+<code>bin/hadoop</code> script in that directory. Note that the data in this
+HDFS goes away when you stop and restart a machine.</li>
+  <li>There is also a <em>persistent HDFS</em> instance in
+<code>/root/presistent-hdfs</code> that will keep data across cluster restarts.
+Typically each node has relatively little space of persistent data
+(about 3 GB), but you can use the <code>--ebs-vol-size</code> option to
+<code>spark-ec2</code> to attach a persistent EBS volume to each node for
+storing the persistent HDFS.</li>
+  <li>Finally, if you get errors while running your application, look at the slave&#8217;s logs
+for that application inside of the scheduler work directory (/root/spark/work). You can
+also view the status of the cluster using the web UI: <code>http://&lt;master-hostname&gt;:8080</code>.</li>
+</ul>
+
+<h1 id="configuration">Configuration</h1>
+
+<p>You can edit <code>/root/spark/conf/spark-env.sh</code> on each machine to set Spark configuration options, such
+as JVM options. This file needs to be copied to <strong>every machine</strong> to reflect the change. The easiest way to
+do this is to use a script we provide called <code>copy-dir</code>. First edit your <code>spark-env.sh</code> file on the master, 
+then run <code>~/spark-ec2/copy-dir /root/spark/conf</code> to RSYNC it to all the workers.</p>
+
+<p>The <a href="configuration.html">configuration guide</a> describes the available configuration options.</p>
+
+<h1 id="terminating-a-cluster">Terminating a Cluster</h1>
+
+<p><strong><em>Note that there is no way to recover data on EC2 nodes after shutting
+them down! Make sure you have copied everything important off the nodes
+before stopping them.</em></strong></p>
+
+<ul>
+  <li>Go into the <code>ec2</code> directory in the release of Spark you downloaded.</li>
+  <li>Run <code>./spark-ec2 destroy &lt;cluster-name&gt;</code>.</li>
+</ul>
+
+<h1 id="pausing-and-restarting-clusters">Pausing and Restarting Clusters</h1>
+
+<p>The <code>spark-ec2</code> script also supports pausing a cluster. In this case,
+the VMs are stopped but not terminated, so they
+<strong><em>lose all data on ephemeral disks</em></strong> but keep the data in their
+root partitions and their <code>persistent-hdfs</code>. Stopped machines will not
+cost you any EC2 cycles, but <strong><em>will</em></strong> continue to cost money for EBS
+storage.</p>
+
+<ul>
+  <li>To stop one of your clusters, go into the <code>ec2</code> directory and run
+<code>./spark-ec2 stop &lt;cluster-name&gt;</code>.</li>
+  <li>To restart it later, run
+<code>./spark-ec2 -i &lt;key-file&gt; start &lt;cluster-name&gt;</code>.</li>
+  <li>To ultimately destroy the cluster and stop consuming EBS space, run
+<code>./spark-ec2 destroy &lt;cluster-name&gt;</code> as described in the previous
+section.</li>
+</ul>
+
+<h1 id="limitations">Limitations</h1>
+
+<ul>
+  <li>Support for &#8220;cluster compute&#8221; nodes is limited &#8211; there&#8217;s no way to specify a
+locality group. However, you can launch slave nodes in your
+<code>&lt;clusterName&gt;-slaves</code> group manually and then use <code>spark-ec2 launch
+--resume</code> to start a cluster with them.</li>
+</ul>
+
+<p>If you have a patch or suggestion for one of these limitations, feel free to
+<a href="contributing-to-spark.html">contribute</a> it!</p>
+
+<h1 id="accessing-data-in-s3">Accessing Data in S3</h1>
+
+<p>Spark&#8217;s file interface allows it to process data in Amazon S3 using the same URI formats that are supported for Hadoop. You can specify a path in S3 as input through a URI of the form <code>s3n://&lt;bucket&gt;/path</code>. You will also need to set your Amazon security credentials, either by setting the environment variables <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> before your program or through <code>SparkContext.hadoopConfiguration</code>. Full instructions on S3 access using the Hadoop input libraries can be found on the <a href="http://wiki.apache.org/hadoop/AmazonS3">Hadoop S3 page</a>.</p>
+
+<p>In addition to using a single input file, you can also use a directory of files as input by simply giving the path to the directory.</p>
+
+            <!-- Main hero unit for a primary marketing message or call to action -->
+            <!--<div class="hero-unit">
+                <h1>Hello, world!</h1>
+                <p>This is a template for a simple marketing or informational website. It includes a large callout called the hero unit and three supporting pieces of content. Use it as a starting point to create something more unique.</p>
+                <p><a class="btn btn-primary btn-large">Learn more &raquo;</a></p>
+            </div>-->
+
+            <!-- Example row of columns -->
+            <!--<div class="row">
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+                </div>
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+               </div>
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec sed odio dui. Cras justo odio, dapibus ac facilisis in, egestas eget quam. Vestibulum id ligula porta felis euismod semper. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.</p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+                </div>
+            </div>
+
+            <hr>-->
+
+            <footer>
+              <hr>
+              <p style="text-align: center; veritcal-align: middle; color: #999;">
+                Apache Spark is an effort undergoing incubation at the Apache Software Foundation.
+                <a href="http://incubator.apache.org">
+                  <img style="margin-left: 20px;" src="img/incubator-logo.png" />
+                </a>
+              </p>
+            </footer>
+
+        </div> <!-- /container -->
+
+        <script src="js/vendor/jquery-1.8.0.min.js"></script>
+        <script src="js/vendor/bootstrap.min.js"></script>
+        <script src="js/main.js"></script>
+        
+        <!-- A script to fix internal hash links because we have an overlapping top bar.
+             Based on https://github.com/twitter/bootstrap/issues/193#issuecomment-2281510 -->
+        <script>
+          $(function() {
+            function maybeScrollToHash() {
+              if (window.location.hash && $(window.location.hash).length) {
+                var newTop = $(window.location.hash).offset().top - $('#topbar').height() - 5;
+                $(window).scrollTop(newTop);
+              }
+            }
+            $(window).bind('hashchange', function() {
+              maybeScrollToHash();
+            });
+            // Scroll now too in case we had opened the page on a hash, but wait 1 ms because some browsers
+            // will try to do *their* initial scroll after running the onReady handler.
+            setTimeout(function() { maybeScrollToHash(); }, 1)
+          })
+        </script>
+
+    </body>
+</html>

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/hadoop-third-party-distributions.html
==============================================================================
--- dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/hadoop-third-party-distributions.html (added)
+++ dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/hadoop-third-party-distributions.html Wed Sep 25 00:14:43 2013
@@ -0,0 +1,302 @@
+<!DOCTYPE html>
+<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
+<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
+<!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
+    <head>
+        <meta charset="utf-8">
+        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
+        <title>Running with Cloudera and HortonWorks - Spark 0.8.0 Documentation</title>
+        <meta name="description" content="">
+
+        <link rel="stylesheet" href="css/bootstrap.min.css">
+        <style>
+            body {
+                padding-top: 60px;
+                padding-bottom: 40px;
+            }
+        </style>
+        <meta name="viewport" content="width=device-width">
+        <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
+        <link rel="stylesheet" href="css/main.css">
+
+        <script src="js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
+        
+        <link rel="stylesheet" href="css/pygments-default.css">
+
+        <!-- Google analytics script -->
+        <script type="text/javascript">
+          /*
+          var _gaq = _gaq || [];
+          _gaq.push(['_setAccount', 'UA-32518208-1']);
+          _gaq.push(['_trackPageview']);
+
+          (function() {
+            var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+            ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+            var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+          })();
+          */
+        </script>
+
+    </head>
+    <body>
+        <!--[if lt IE 7]>
+            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
+        <![endif]-->
+
+        <!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html -->
+
+        <div class="navbar navbar-fixed-top" id="topbar">
+            <div class="navbar-inner">
+                <div class="container">
+                    <div class="brand"><a href="index.html">
+                      <img src="img/spark-logo-hd.png" style="height:50px;"/></a><span class="version">0.8.0</span>
+                    </div>
+                    <ul class="nav">
+                        <!--TODO(andyk): Add class="active" attribute to li some how.-->
+                        <li><a href="index.html">Overview</a></li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Programming Guides<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="quick-start.html">Quick Start</a></li>
+                                <li><a href="scala-programming-guide.html">Spark in Scala</a></li>
+                                <li><a href="java-programming-guide.html">Spark in Java</a></li>
+                                <li><a href="python-programming-guide.html">Spark in Python</a></li>
+                                <li class="divider"></li>
+                                <li><a href="streaming-programming-guide.html">Spark Streaming</a></li>
+                                <li><a href="mllib-guide.html">MLlib (Machine Learning)</a></li>
+                                <li><a href="bagel-programming-guide.html">Bagel (Pregel on Spark)</a></li>
+                            </ul>
+                        </li>
+                        
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">API Docs<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="api/core/index.html">Spark Core for Java/Scala</a></li>
+                                <li><a href="api/pyspark/index.html">Spark Core for Python</a></li>
+                                <li class="divider"></li>
+                                <li><a href="api/streaming/index.html">Spark Streaming</a></li>
+                                <li><a href="api/mllib/index.html">MLlib (Machine Learning)</a></li>
+                                <li><a href="api/bagel/index.html">Bagel (Pregel on Spark)</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Deploying<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="cluster-overview.html">Overview</a></li>
+                                <li><a href="ec2-scripts.html">Amazon EC2</a></li>
+                                <li><a href="spark-standalone.html">Standalone Mode</a></li>
+                                <li><a href="running-on-mesos.html">Mesos</a></li>
+                                <li><a href="running-on-yarn.html">YARN</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="api.html" class="dropdown-toggle" data-toggle="dropdown">More<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="configuration.html">Configuration</a></li>
+                                <li><a href="monitoring.html">Monitoring</a></li>
+                                <li><a href="tuning.html">Tuning Guide</a></li>
+                                <li><a href="hadoop-third-party-distributions.html">Running with CDH/HDP</a></li>
+                                <li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
+                                <li><a href="job-scheduling.html">Job Scheduling</a></li>
+                                <li class="divider"></li>
+                                <li><a href="building-with-maven.html">Building Spark with Maven</a></li>
+                                <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark</a></li>
+                            </ul>
+                        </li>
+                    </ul>
+                    <!--<p class="navbar-text pull-right"><span class="version-text">v0.8.0</span></p>-->
+                </div>
+            </div>
+        </div>
+
+        <div class="container" id="content">
+          <h1 class="title">Running with Cloudera and HortonWorks</h1>
+
+          <p>Spark can run against all versions of Cloudera&#8217;s Distribution Including Apache Hadoop (CDH) and
+the Hortonworks Data Platform (HDP). There are a few things to keep in mind when using Spark
+with these distributions:</p>
+
+<h1 id="compile-time-hadoop-version">Compile-time Hadoop Version</h1>
+
+<p>When compiling Spark, you&#8217;ll need to 
+<a href="http://localhost:4000/index.html#a-note-about-hadoop-versions">set the SPARK_HADOOP_VERSION flag</a>:</p>
+
+<pre><code>SPARK_HADOOP_VERSION=1.0.4 sbt/sbt assembly
+</code></pre>
+
+<p>The table below lists the corresponding <code>SPARK_HADOOP_VERSION</code> code for each CDH/HDP release. Note that
+some Hadoop releases are binary compatible across client versions. This means the pre-built Spark
+distribution may &#8220;just work&#8221; without you needing to compile. That said, we recommend compiling with 
+the <em>exact</em> Hadoop version you are running to avoid any compatibility errors.</p>
+
+<table>
+  <tr valign="top">
+    <td>
+      <h3>CDH Releases</h3>
+      <table class="table" style="width:350px; margin-right: 20px;">
+        <tr><th>Release</th><th>Version code</th></tr>
+        <tr><td>CDH 4.X.X (YARN mode)</td><td>2.0.0-chd4.X.X</td></tr>
+        <tr><td>CDH 4.X.X</td><td>2.0.0-mr1-chd4.X.X</td></tr>
+        <tr><td>CDH 3u6</td><td>0.20.2-cdh3u6</td></tr>
+        <tr><td>CDH 3u5</td><td>0.20.2-cdh3u5</td></tr>
+        <tr><td>CDH 3u4</td><td>0.20.2-cdh3u4</td></tr>
+      </table>
+    </td>
+    <td>
+      <h3>HDP Releases</h3>
+      <table class="table" style="width:350px;">
+        <tr><th>Release</th><th>Version code</th></tr>
+        <tr><td>HDP 1.3</td><td>1.2.0</td></tr>
+        <tr><td>HDP 1.2</td><td>1.1.2</td></tr>
+        <tr><td>HDP 1.1</td><td>1.0.3</td></tr>
+        <tr><td>HDP 1.0</td><td>1.0.3</td></tr>
+      </table>
+    </td>
+  </tr>
+</table>
+
+<h1 id="linking-applications-to-the-hadoop-version">Linking Applications to the Hadoop Version</h1>
+
+<p>In addition to compiling Spark itself against the right version, you need to add a Maven dependency on that
+version of <code>hadoop-client</code> to any Spark applications you run, so they can also talk to the HDFS version
+on the cluster. If you are using CDH, you also need to add the Cloudera Maven repository.
+This looks as follows in SBT:</p>
+
+<div class="highlight"><pre><code class="scala"><span class="n">libraryDependencies</span> <span class="o">+=</span> <span class="s">&quot;org.apache.hadoop&quot;</span> <span class="o">%</span> <span class="s">&quot;hadoop-client&quot;</span> <span class="o">%</span> <span class="s">&quot;&lt;version&gt;&quot;</span>
+
+<span class="c1">// If using CDH, also add Cloudera repo</span>
+<span class="n">resolvers</span> <span class="o">+=</span> <span class="s">&quot;Cloudera Repository&quot;</span> <span class="n">at</span> <span class="s">&quot;https://repository.cloudera.com/artifactory/cloudera-repos/&quot;</span>
+</code></pre></div>
+
+<p>Or in Maven:</p>
+
+<div class="highlight"><pre><code class="xml"><span class="nt">&lt;project&gt;</span>
+  <span class="nt">&lt;dependencies&gt;</span>
+    ...
+    <span class="nt">&lt;dependency&gt;</span>
+      <span class="nt">&lt;groupId&gt;</span>org.apache.hadoop<span class="nt">&lt;/groupId&gt;</span>
+      <span class="nt">&lt;artifactId&gt;</span>hadoop-client<span class="nt">&lt;/artifactId&gt;</span>
+      <span class="nt">&lt;version&gt;</span>[version]<span class="nt">&lt;/version&gt;</span>
+    <span class="nt">&lt;/dependency&gt;</span>
+  <span class="nt">&lt;/dependencies&gt;</span>
+
+  <span class="c">&lt;!-- If using CDH, also add Cloudera repo --&gt;</span>
+  <span class="nt">&lt;repositories&gt;</span>
+    ...
+    <span class="nt">&lt;repository&gt;</span>
+      <span class="nt">&lt;id&gt;</span>Cloudera repository<span class="nt">&lt;/id&gt;</span>
+      <span class="nt">&lt;url&gt;</span>https://repository.cloudera.com/artifactory/cloudera-repos/<span class="nt">&lt;/url&gt;</span>
+    <span class="nt">&lt;/repository&gt;</span>
+  <span class="nt">&lt;/repositories&gt;</span>
+<span class="nt">&lt;/project&gt;</span>
+</code></pre></div>
+
+<h1 id="where-to-run-spark">Where to Run Spark</h1>
+
+<p>As described in the <a href="hardware-provisioning.html#storage-systems">Hardware Provisioning</a> guide,
+Spark can run in a variety of deployment modes:</p>
+
+<ul>
+  <li>Using dedicated set of Spark nodes in your cluster. These nodes should be co-located with your
+Hadoop installation.</li>
+  <li>Running on the same nodes as an existing Hadoop installation, with a fixed amount memory and 
+cores dedicated to Spark on each node.</li>
+  <li>Run Spark alongside Hadoop using a cluster resource manager, such as YARN or Mesos.</li>
+</ul>
+
+<p>These options are identical for those using CDH and HDP. </p>
+
+<h1 id="inheriting-cluster-configuration">Inheriting Cluster Configuration</h1>
+
+<p>If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that
+should be included on Spark&#8217;s classpath:</p>
+
+<ul>
+  <li><code>hdfs-site.xml</code>, which provides default behaviors for the HDFS client.</li>
+  <li><code>core-site.xml</code>, which sets the default filesystem name.</li>
+</ul>
+
+<p>The location of these configuration files varies across CDH and HDP versions, but
+a common location is inside of <code>/etc/hadoop/conf</code>. Some tools, such as Cloudera Manager, create
+configurations on-the-fly, but offer a mechanisms to download copies of them.</p>
+
+<p>There are a few ways to make these files visible to Spark:</p>
+
+<ul>
+  <li>You can copy these files into <code>$SPARK_HOME/conf</code> and they will be included in Spark&#8217;s
+classpath automatically.</li>
+  <li>If you are running Spark on the same nodes as Hadoop <em>and</em> your distribution includes both
+<code>hdfs-site.xml</code> and <code>core-site.xml</code> in the same directory, you can set <code>HADOOP_CONF_DIR</code> 
+in <code>$SPARK_HOME/spark-env.sh</code> to that directory.</li>
+</ul>
+
+            <!-- Main hero unit for a primary marketing message or call to action -->
+            <!--<div class="hero-unit">
+                <h1>Hello, world!</h1>
+                <p>This is a template for a simple marketing or informational website. It includes a large callout called the hero unit and three supporting pieces of content. Use it as a starting point to create something more unique.</p>
+                <p><a class="btn btn-primary btn-large">Learn more &raquo;</a></p>
+            </div>-->
+
+            <!-- Example row of columns -->
+            <!--<div class="row">
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+                </div>
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+               </div>
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec sed odio dui. Cras justo odio, dapibus ac facilisis in, egestas eget quam. Vestibulum id ligula porta felis euismod semper. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.</p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+                </div>
+            </div>
+
+            <hr>-->
+
+            <footer>
+              <hr>
+              <p style="text-align: center; veritcal-align: middle; color: #999;">
+                Apache Spark is an effort undergoing incubation at the Apache Software Foundation.
+                <a href="http://incubator.apache.org">
+                  <img style="margin-left: 20px;" src="img/incubator-logo.png" />
+                </a>
+              </p>
+            </footer>
+
+        </div> <!-- /container -->
+
+        <script src="js/vendor/jquery-1.8.0.min.js"></script>
+        <script src="js/vendor/bootstrap.min.js"></script>
+        <script src="js/main.js"></script>
+        
+        <!-- A script to fix internal hash links because we have an overlapping top bar.
+             Based on https://github.com/twitter/bootstrap/issues/193#issuecomment-2281510 -->
+        <script>
+          $(function() {
+            function maybeScrollToHash() {
+              if (window.location.hash && $(window.location.hash).length) {
+                var newTop = $(window.location.hash).offset().top - $('#topbar').height() - 5;
+                $(window).scrollTop(newTop);
+              }
+            }
+            $(window).bind('hashchange', function() {
+              maybeScrollToHash();
+            });
+            // Scroll now too in case we had opened the page on a hash, but wait 1 ms because some browsers
+            // will try to do *their* initial scroll after running the onReady handler.
+            setTimeout(function() { maybeScrollToHash(); }, 1)
+          })
+        </script>
+
+    </body>
+</html>

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/hardware-provisioning.html
==============================================================================
--- dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/hardware-provisioning.html (added)
+++ dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/hardware-provisioning.html Wed Sep 25 00:14:43 2013
@@ -0,0 +1,255 @@
+<!DOCTYPE html>
+<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
+<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
+<!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
+    <head>
+        <meta charset="utf-8">
+        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
+        <title>Hardware Provisioning - Spark 0.8.0 Documentation</title>
+        <meta name="description" content="">
+
+        <link rel="stylesheet" href="css/bootstrap.min.css">
+        <style>
+            body {
+                padding-top: 60px;
+                padding-bottom: 40px;
+            }
+        </style>
+        <meta name="viewport" content="width=device-width">
+        <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
+        <link rel="stylesheet" href="css/main.css">
+
+        <script src="js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
+        
+        <link rel="stylesheet" href="css/pygments-default.css">
+
+        <!-- Google analytics script -->
+        <script type="text/javascript">
+          /*
+          var _gaq = _gaq || [];
+          _gaq.push(['_setAccount', 'UA-32518208-1']);
+          _gaq.push(['_trackPageview']);
+
+          (function() {
+            var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+            ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+            var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+          })();
+          */
+        </script>
+
+    </head>
+    <body>
+        <!--[if lt IE 7]>
+            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
+        <![endif]-->
+
+        <!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html -->
+
+        <div class="navbar navbar-fixed-top" id="topbar">
+            <div class="navbar-inner">
+                <div class="container">
+                    <div class="brand"><a href="index.html">
+                      <img src="img/spark-logo-hd.png" style="height:50px;"/></a><span class="version">0.8.0</span>
+                    </div>
+                    <ul class="nav">
+                        <!--TODO(andyk): Add class="active" attribute to li some how.-->
+                        <li><a href="index.html">Overview</a></li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Programming Guides<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="quick-start.html">Quick Start</a></li>
+                                <li><a href="scala-programming-guide.html">Spark in Scala</a></li>
+                                <li><a href="java-programming-guide.html">Spark in Java</a></li>
+                                <li><a href="python-programming-guide.html">Spark in Python</a></li>
+                                <li class="divider"></li>
+                                <li><a href="streaming-programming-guide.html">Spark Streaming</a></li>
+                                <li><a href="mllib-guide.html">MLlib (Machine Learning)</a></li>
+                                <li><a href="bagel-programming-guide.html">Bagel (Pregel on Spark)</a></li>
+                            </ul>
+                        </li>
+                        
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">API Docs<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="api/core/index.html">Spark Core for Java/Scala</a></li>
+                                <li><a href="api/pyspark/index.html">Spark Core for Python</a></li>
+                                <li class="divider"></li>
+                                <li><a href="api/streaming/index.html">Spark Streaming</a></li>
+                                <li><a href="api/mllib/index.html">MLlib (Machine Learning)</a></li>
+                                <li><a href="api/bagel/index.html">Bagel (Pregel on Spark)</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Deploying<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="cluster-overview.html">Overview</a></li>
+                                <li><a href="ec2-scripts.html">Amazon EC2</a></li>
+                                <li><a href="spark-standalone.html">Standalone Mode</a></li>
+                                <li><a href="running-on-mesos.html">Mesos</a></li>
+                                <li><a href="running-on-yarn.html">YARN</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="api.html" class="dropdown-toggle" data-toggle="dropdown">More<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="configuration.html">Configuration</a></li>
+                                <li><a href="monitoring.html">Monitoring</a></li>
+                                <li><a href="tuning.html">Tuning Guide</a></li>
+                                <li><a href="hadoop-third-party-distributions.html">Running with CDH/HDP</a></li>
+                                <li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
+                                <li><a href="job-scheduling.html">Job Scheduling</a></li>
+                                <li class="divider"></li>
+                                <li><a href="building-with-maven.html">Building Spark with Maven</a></li>
+                                <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark</a></li>
+                            </ul>
+                        </li>
+                    </ul>
+                    <!--<p class="navbar-text pull-right"><span class="version-text">v0.8.0</span></p>-->
+                </div>
+            </div>
+        </div>
+
+        <div class="container" id="content">
+          <h1 class="title">Hardware Provisioning</h1>
+
+          <p>A common question received by Spark developers is how to configure hardware for it. While the right
+hardware will depend on the situation, we make the following recommendations.</p>
+
+<h1 id="storage-systems">Storage Systems</h1>
+
+<p>Because most Spark jobs will likely have to read input data from an external storage system (e.g.
+the Hadoop File System, or HBase), it is important to place it <strong>as close to this system as
+possible</strong>. We recommend the following:</p>
+
+<ul>
+  <li>
+    <p>If at all possible, run Spark on the same nodes as HDFS. The simplest way is to set up a Spark
+<a href="spark-standalone.html">standalone mode cluster</a> on the same nodes, and configure Spark and
+Hadoop&#8217;s memory and CPU usage to avoid interference (for Hadoop, the relevant options are
+<code>mapred.child.java.opts</code> for the per-task memory and <code>mapred.tasktracker.map.tasks.maximum</code>
+and <code>mapred.tasktracker.reduce.tasks.maximum</code> for number of tasks). Alternatively, you can run
+Hadoop and Spark on a common cluster manager like <a href="running-on-mesos.html">Mesos</a> or
+<a href="running-on-yarn.html">Hadoop YARN</a>.</p>
+  </li>
+  <li>
+    <p>If this is not possible, run Spark on different nodes in the same local-area network as HDFS.</p>
+  </li>
+  <li>
+    <p>For low-latency data stores like HBase, it may be preferrable to run computing jobs on different
+nodes than the storage system to avoid interference.</p>
+  </li>
+</ul>
+
+<h1 id="local-disks">Local Disks</h1>
+
+<p>While Spark can perform a lot of its computation in memory, it still uses local disks to store
+data that doesn&#8217;t fit in RAM, as well as to preserve intermediate output between stages. We
+recommend having <strong>4-8 disks</strong> per node, configured <em>without</em> RAID (just as separate mount points).
+In Linux, mount the disks with the <a href="http://www.centos.org/docs/5/html/Global_File_System/s2-manage-mountnoatime.html"><code>noatime</code> option</a>
+to reduce unnecessary writes. In Spark, <a href="configuration.html">configure</a> the <code>spark.local.dir</code>
+variable to be a comma-separated list of the local disks. If you are running HDFS, it&#8217;s fine to
+use the same disks as HDFS.</p>
+
+<h1 id="memory">Memory</h1>
+
+<p>In general, Spark can run well with anywhere from <strong>8 GB to hundreds of gigabytes</strong> of memory per
+machine. In all cases, we recommend allocating only at most 75% of the memory for Spark; leave the
+rest for the operating system and buffer cache.</p>
+
+<p>How much memory you will need will depend on your application. To determine how much your
+application uses for a certain dataset size, load part of your dataset in a Spark RDD and use the
+Storage tab of Spark&#8217;s monitoring UI (<code>http://&lt;driver-node&gt;:4040</code>) to see its size in memory.
+Note that memory usage is greatly affected by storage level and serialization format &#8211; see
+the <a href="tuning.html">tuning guide</a> for tips on how to reduce it.</p>
+
+<p>Finally, note that the Java VM does not always behave well with more than 200 GB of RAM. If you
+purchase machines with more RAM than this, you can run <em>multiple worker JVMs per node</em>. In
+Spark&#8217;s <a href="spark-standalone.html">standalone mode</a>, you can set the number of workers per node
+with the <code>SPARK_WORKER_INSTANCES</code> variable in <code>conf/spark-env.sh</code>, and the number of cores
+per worker with <code>SPARK_WORKER_CORES</code>.</p>
+
+<h1 id="network">Network</h1>
+
+<p>In our experience, when the data is in memory, a lot of Spark applications are network-bound.
+Using a <strong>10 Gigabit</strong> or higher network is the best way to make these applications faster.
+This is especially true for &#8220;distributed reduce&#8221; applications such as group-bys, reduce-bys, and
+SQL joins. In any given application, you can see how much data Spark shuffles across the network
+from the application&#8217;s monitoring UI (<code>http://&lt;driver-node&gt;:4040</code>).</p>
+
+<h1 id="cpu-cores">CPU Cores</h1>
+
+<p>Spark scales well to tens of CPU cores per machine because it performes minimal sharing between
+threads. You should likely provision at least <strong>8-16 cores</strong> per machine. Depending on the CPU
+cost of your workload, you may also need more: once data is in memory, most applications are
+either CPU- or network-bound.</p>
+
+            <!-- Main hero unit for a primary marketing message or call to action -->
+            <!--<div class="hero-unit">
+                <h1>Hello, world!</h1>
+                <p>This is a template for a simple marketing or informational website. It includes a large callout called the hero unit and three supporting pieces of content. Use it as a starting point to create something more unique.</p>
+                <p><a class="btn btn-primary btn-large">Learn more &raquo;</a></p>
+            </div>-->
+
+            <!-- Example row of columns -->
+            <!--<div class="row">
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+                </div>
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+               </div>
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec sed odio dui. Cras justo odio, dapibus ac facilisis in, egestas eget quam. Vestibulum id ligula porta felis euismod semper. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.</p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+                </div>
+            </div>
+
+            <hr>-->
+
+            <footer>
+              <hr>
+              <p style="text-align: center; veritcal-align: middle; color: #999;">
+                Apache Spark is an effort undergoing incubation at the Apache Software Foundation.
+                <a href="http://incubator.apache.org">
+                  <img style="margin-left: 20px;" src="img/incubator-logo.png" />
+                </a>
+              </p>
+            </footer>
+
+        </div> <!-- /container -->
+
+        <script src="js/vendor/jquery-1.8.0.min.js"></script>
+        <script src="js/vendor/bootstrap.min.js"></script>
+        <script src="js/main.js"></script>
+        
+        <!-- A script to fix internal hash links because we have an overlapping top bar.
+             Based on https://github.com/twitter/bootstrap/issues/193#issuecomment-2281510 -->
+        <script>
+          $(function() {
+            function maybeScrollToHash() {
+              if (window.location.hash && $(window.location.hash).length) {
+                var newTop = $(window.location.hash).offset().top - $('#topbar').height() - 5;
+                $(window).scrollTop(newTop);
+              }
+            }
+            $(window).bind('hashchange', function() {
+              maybeScrollToHash();
+            });
+            // Scroll now too in case we had opened the page on a hash, but wait 1 ms because some browsers
+            // will try to do *their* initial scroll after running the onReady handler.
+            setTimeout(function() { maybeScrollToHash(); }, 1)
+          })
+        </script>
+
+    </body>
+</html>

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/cluster-overview.png
==============================================================================
Binary file - no diff available.

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/cluster-overview.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/cluster-overview.pptx
==============================================================================
Binary file - no diff available.

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/cluster-overview.pptx
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/glyphicons-halflings-white.png
==============================================================================
Binary file - no diff available.

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/glyphicons-halflings-white.png
------------------------------------------------------------------------------
    svn:executable = *

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/glyphicons-halflings-white.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/glyphicons-halflings.png
==============================================================================
Binary file - no diff available.

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/glyphicons-halflings.png
------------------------------------------------------------------------------
    svn:executable = *

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/glyphicons-halflings.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/incubator-logo.png
==============================================================================
Binary file - no diff available.

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/incubator-logo.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/spark-logo-100x40px.png
==============================================================================
Binary file - no diff available.

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/spark-logo-100x40px.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/spark-logo-77x40px-hd.png
==============================================================================
Binary file - no diff available.

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/spark-logo-77x40px-hd.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/spark-logo-77x50px-hd.png
==============================================================================
Binary file - no diff available.

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/spark-logo-77x50px-hd.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/spark-logo-hd.png
==============================================================================
Binary file - no diff available.

Propchange: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/img/spark-logo-hd.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/index.html
==============================================================================
--- dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/index.html (added)
+++ dev/incubator/spark/spark-0.8.0-incubating-rc6-docs/index.html Wed Sep 25 00:14:43 2013
@@ -0,0 +1,323 @@
+<!DOCTYPE html>
+<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
+<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
+<!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
+    <head>
+        <meta charset="utf-8">
+        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
+        <title>Spark Overview - Spark 0.8.0 Documentation</title>
+        <meta name="description" content="">
+
+        <link rel="stylesheet" href="css/bootstrap.min.css">
+        <style>
+            body {
+                padding-top: 60px;
+                padding-bottom: 40px;
+            }
+        </style>
+        <meta name="viewport" content="width=device-width">
+        <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
+        <link rel="stylesheet" href="css/main.css">
+
+        <script src="js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
+        
+        <link rel="stylesheet" href="css/pygments-default.css">
+
+        <!-- Google analytics script -->
+        <script type="text/javascript">
+          /*
+          var _gaq = _gaq || [];
+          _gaq.push(['_setAccount', 'UA-32518208-1']);
+          _gaq.push(['_trackPageview']);
+
+          (function() {
+            var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+            ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+            var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+          })();
+          */
+        </script>
+
+    </head>
+    <body>
+        <!--[if lt IE 7]>
+            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
+        <![endif]-->
+
+        <!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html -->
+
+        <div class="navbar navbar-fixed-top" id="topbar">
+            <div class="navbar-inner">
+                <div class="container">
+                    <div class="brand"><a href="index.html">
+                      <img src="img/spark-logo-hd.png" style="height:50px;"/></a><span class="version">0.8.0</span>
+                    </div>
+                    <ul class="nav">
+                        <!--TODO(andyk): Add class="active" attribute to li some how.-->
+                        <li><a href="index.html">Overview</a></li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Programming Guides<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="quick-start.html">Quick Start</a></li>
+                                <li><a href="scala-programming-guide.html">Spark in Scala</a></li>
+                                <li><a href="java-programming-guide.html">Spark in Java</a></li>
+                                <li><a href="python-programming-guide.html">Spark in Python</a></li>
+                                <li class="divider"></li>
+                                <li><a href="streaming-programming-guide.html">Spark Streaming</a></li>
+                                <li><a href="mllib-guide.html">MLlib (Machine Learning)</a></li>
+                                <li><a href="bagel-programming-guide.html">Bagel (Pregel on Spark)</a></li>
+                            </ul>
+                        </li>
+                        
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">API Docs<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="api/core/index.html">Spark Core for Java/Scala</a></li>
+                                <li><a href="api/pyspark/index.html">Spark Core for Python</a></li>
+                                <li class="divider"></li>
+                                <li><a href="api/streaming/index.html">Spark Streaming</a></li>
+                                <li><a href="api/mllib/index.html">MLlib (Machine Learning)</a></li>
+                                <li><a href="api/bagel/index.html">Bagel (Pregel on Spark)</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Deploying<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="cluster-overview.html">Overview</a></li>
+                                <li><a href="ec2-scripts.html">Amazon EC2</a></li>
+                                <li><a href="spark-standalone.html">Standalone Mode</a></li>
+                                <li><a href="running-on-mesos.html">Mesos</a></li>
+                                <li><a href="running-on-yarn.html">YARN</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="api.html" class="dropdown-toggle" data-toggle="dropdown">More<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="configuration.html">Configuration</a></li>
+                                <li><a href="monitoring.html">Monitoring</a></li>
+                                <li><a href="tuning.html">Tuning Guide</a></li>
+                                <li><a href="hadoop-third-party-distributions.html">Running with CDH/HDP</a></li>
+                                <li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
+                                <li><a href="job-scheduling.html">Job Scheduling</a></li>
+                                <li class="divider"></li>
+                                <li><a href="building-with-maven.html">Building Spark with Maven</a></li>
+                                <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark</a></li>
+                            </ul>
+                        </li>
+                    </ul>
+                    <!--<p class="navbar-text pull-right"><span class="version-text">v0.8.0</span></p>-->
+                </div>
+            </div>
+        </div>
+
+        <div class="container" id="content">
+          <h1 class="title">Spark Overview</h1>
+
+          <p>Apache Spark is a fast and general-purpose cluster computing system.
+It provides high-level APIs in <a href="scala-programming-guide.html">Scala</a>, <a href="java-programming-guide.html">Java</a>, and <a href="python-programming-guide.html">Python</a> that make parallel jobs easy to write, and an optimized engine that supports general computation graphs.
+It also supports a rich set of higher-level tools including <a href="http://shark.cs.berkeley.edu">Shark</a> (Hive on Spark), <a href="mllib-guide.html">MLlib</a> for machine learning, <a href="bagel-programming-guide.html">Bagel</a> for graph processing, and <a href="streaming-programming-guide.html">Spark Streaming</a>.</p>
+
+<h1 id="downloading">Downloading</h1>
+
+<p>Get Spark by visiting the <a href="http://spark.incubator.apache.org/downloads.html">downloads page</a> of the Apache Spark site. This documentation is for Spark version 0.8.0-incubating.</p>
+
+<p>Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). All you need to run it is to have <code>java</code> to installed on your system <code>PATH</code>, or the <code>JAVA_HOME</code> environment variable pointing to a Java installation.</p>
+
+<h1 id="building">Building</h1>
+
+<p>Spark uses <a href="http://www.scala-sbt.org">Simple Build Tool</a>, which is bundled with it. To compile the code, go into the top-level Spark directory and run</p>
+
+<pre><code>sbt/sbt assembly
+</code></pre>
+
+<p>For its Scala API, Spark 0.8.0-incubating depends on Scala 2.9.3. If you write applications in Scala, you will need to use this same version of Scala in your own program &#8211; newer major versions may not work. You can get the right version of Scala from <a href="http://www.scala-lang.org/download/">scala-lang.org</a>.</p>
+
+<h1 id="running-the-examples-and-shell">Running the Examples and Shell</h1>
+
+<p>Spark comes with several sample programs in the <code>examples</code> directory.
+To run one of the samples, use <code>./run-example &lt;class&gt; &lt;params&gt;</code> in the top-level Spark directory
+(the <code>run-example</code> script sets up the appropriate paths and launches that program).
+For example, try <code>./run-example org.apache.spark.examples.SparkPi local</code>.
+Each example prints usage help when run with no parameters.</p>
+
+<p>Note that all of the sample programs take a <code>&lt;master&gt;</code> parameter specifying the cluster URL
+to connect to. This can be a <a href="scala-programming-guide.html#master-urls">URL for a distributed cluster</a>,
+or <code>local</code> to run locally with one thread, or <code>local[N]</code> to run locally with N threads. You should start by using
+<code>local</code> for testing.</p>
+
+<p>Finally, you can run Spark interactively through modified versions of the Scala shell (<code>./spark-shell</code>) or
+Python interpreter (<code>./pyspark</code>). These are a great way to learn the framework.</p>
+
+<h1 id="launching-on-a-cluster">Launching on a Cluster</h1>
+
+<p>The Spark <a href="cluster-overview.html">cluster mode overview</a> explains the key concepts in running on a cluster.
+Spark can run both by itself, or over several existing cluster managers. It currently provides several
+options for deployment:</p>
+
+<ul>
+  <li><a href="ec2-scripts.html">Amazon EC2</a>: our EC2 scripts let you launch a cluster in about 5 minutes</li>
+  <li><a href="spark-standalone.html">Standalone Deploy Mode</a>: simplest way to deploy Spark on a private cluster</li>
+  <li><a href="running-on-mesos.html">Apache Mesos</a></li>
+  <li><a href="running-on-yarn.html">Hadoop YARN</a></li>
+</ul>
+
+<h1 id="a-note-about-hadoop-versions">A Note About Hadoop Versions</h1>
+
+<p>Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported
+storage systems. Because the HDFS protocol has changed in different versions of
+Hadoop, you must build Spark against the same version that your cluster uses.
+By default, Spark links to Hadoop 1.0.4. You can change this by setting the
+<code>SPARK_HADOOP_VERSION</code> variable when compiling:</p>
+
+<pre><code>SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
+</code></pre>
+
+<p>In addition, if you wish to run Spark on <a href="running-on-yarn.md">YARN</a>, set
+<code>SPARK_YARN</code> to <code>true</code>:</p>
+
+<pre><code>SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
+</code></pre>
+
+<p>(Note that on Windows, you need to set the environment variables on separate lines, e.g., <code>set SPARK_HADOOP_VERSION=1.2.1</code>.)</p>
+
+<h1 id="where-to-go-from-here">Where to Go from Here</h1>
+
+<p><strong>Programming guides:</strong></p>
+
+<ul>
+  <li><a href="quick-start.html">Quick Start</a>: a quick introduction to the Spark API; start here!</li>
+  <li><a href="scala-programming-guide.html">Spark Programming Guide</a>: an overview of Spark concepts, and details on the Scala API
+    <ul>
+      <li><a href="java-programming-guide.html">Java Programming Guide</a>: using Spark from Java</li>
+      <li><a href="python-programming-guide.html">Python Programming Guide</a>: using Spark from Python</li>
+    </ul>
+  </li>
+  <li><a href="streaming-programming-guide.html">Spark Streaming</a>: using the alpha release of Spark Streaming</li>
+  <li><a href="mllib-guide.html">MLlib (Machine Learning)</a>: Spark&#8217;s built-in machine learning library</li>
+  <li><a href="bagel-programming-guide.html">Bagel (Pregel on Spark)</a>: simple graph processing model</li>
+</ul>
+
+<p><strong>API Docs:</strong></p>
+
+<ul>
+  <li><a href="api/core/index.html">Spark for Java/Scala (Scaladoc)</a></li>
+  <li><a href="api/pyspark/index.html">Spark for Python (Epydoc)</a></li>
+  <li><a href="api/streaming/index.html">Spark Streaming for Java/Scala (Scaladoc)</a></li>
+  <li><a href="api/mllib/index.html">MLlib (Machine Learning) for Java/Scala (Scaladoc)</a></li>
+  <li><a href="api/bagel/index.html">Bagel (Pregel on Spark) for Scala (Scaladoc)</a></li>
+</ul>
+
+<p><strong>Deployment guides:</strong></p>
+
+<ul>
+  <li><a href="cluster-overview.html">Cluster Overview</a>: overview of concepts and components when running on a cluster</li>
+  <li><a href="ec2-scripts.html">Amazon EC2</a>: scripts that let you launch a cluster on EC2 in about 5 minutes</li>
+  <li><a href="spark-standalone.html">Standalone Deploy Mode</a>: launch a standalone cluster quickly without a third-party cluster manager</li>
+  <li><a href="running-on-mesos.html">Mesos</a>: deploy a private cluster using
+  <a href="http://incubator.apache.org/mesos">Apache Mesos</a></li>
+  <li><a href="running-on-yarn.html">YARN</a>: deploy Spark on top of Hadoop NextGen (YARN)</li>
+</ul>
+
+<p><strong>Other documents:</strong></p>
+
+<ul>
+  <li><a href="configuration.html">Configuration</a>: customize Spark via its configuration system</li>
+  <li><a href="tuning.html">Tuning Guide</a>: best practices to optimize performance and memory use</li>
+  <li><a href="hardware-provisioning.html">Hardware Provisioning</a>: recommendations for cluster hardware</li>
+  <li><a href="job-scheduling.html">Job Scheduling</a>: scheduling resources across and within Spark applications</li>
+  <li><a href="building-with-maven.html">Building Spark with Maven</a>: build Spark using the Maven system</li>
+  <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark</a></li>
+</ul>
+
+<p><strong>External resources:</strong></p>
+
+<ul>
+  <li><a href="http://spark.incubator.apache.org">Spark Homepage</a></li>
+  <li><a href="http://shark.cs.berkeley.edu">Shark</a>: Apache Hive over Spark</li>
+  <li><a href="http://spark.incubator.apache.org/mailing-lists.html">Mailing Lists</a>: ask questions about Spark here</li>
+  <li><a href="http://ampcamp.berkeley.edu/">AMP Camps</a>: a series of training camps at UC Berkeley that featured talks and
+exercises about Spark, Shark, Mesos, and more. <a href="http://ampcamp.berkeley.edu/agenda-2012">Videos</a>,
+<a href="http://ampcamp.berkeley.edu/agenda-2012">slides</a> and <a href="http://ampcamp.berkeley.edu/exercises-2012">exercises</a> are
+available online for free.</li>
+  <li><a href="http://spark.incubator.apache.org/examples.html">Code Examples</a>: more are also available in the <a href="https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/">examples subfolder</a> of Spark</li>
+  <li><a href="http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf">Paper Describing Spark</a></li>
+  <li><a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf">Paper Describing Spark Streaming</a></li>
+</ul>
+
+<h1 id="community">Community</h1>
+
+<p>To get help using Spark or keep up with Spark development, sign up for the <a href="http://spark.incubator.apache.org/mailing-lists.html">user mailing list</a>.</p>
+
+<p>If you&#8217;re in the San Francisco Bay Area, there&#8217;s a regular <a href="http://www.meetup.com/spark-users/">Spark meetup</a> every few weeks. Come by to meet the developers and other users.</p>
+
+<p>Finally, if you&#8217;d like to contribute code to Spark, read <a href="contributing-to-spark.html">how to contribute</a>.</p>
+
+            <!-- Main hero unit for a primary marketing message or call to action -->
+            <!--<div class="hero-unit">
+                <h1>Hello, world!</h1>
+                <p>This is a template for a simple marketing or informational website. It includes a large callout called the hero unit and three supporting pieces of content. Use it as a starting point to create something more unique.</p>
+                <p><a class="btn btn-primary btn-large">Learn more &raquo;</a></p>
+            </div>-->
+
+            <!-- Example row of columns -->
+            <!--<div class="row">
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+                </div>
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+               </div>
+                <div class="span4">
+                    <h2>Heading</h2>
+                    <p>Donec sed odio dui. Cras justo odio, dapibus ac facilisis in, egestas eget quam. Vestibulum id ligula porta felis euismod semper. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.</p>
+                    <p><a class="btn" href="#">View details &raquo;</a></p>
+                </div>
+            </div>
+
+            <hr>-->
+
+            <footer>
+              <hr>
+              <p style="text-align: center; veritcal-align: middle; color: #999;">
+                Apache Spark is an effort undergoing incubation at the Apache Software Foundation.
+                <a href="http://incubator.apache.org">
+                  <img style="margin-left: 20px;" src="img/incubator-logo.png" />
+                </a>
+              </p>
+            </footer>
+
+        </div> <!-- /container -->
+
+        <script src="js/vendor/jquery-1.8.0.min.js"></script>
+        <script src="js/vendor/bootstrap.min.js"></script>
+        <script src="js/main.js"></script>
+        
+        <!-- A script to fix internal hash links because we have an overlapping top bar.
+             Based on https://github.com/twitter/bootstrap/issues/193#issuecomment-2281510 -->
+        <script>
+          $(function() {
+            function maybeScrollToHash() {
+              if (window.location.hash && $(window.location.hash).length) {
+                var newTop = $(window.location.hash).offset().top - $('#topbar').height() - 5;
+                $(window).scrollTop(newTop);
+              }
+            }
+            $(window).bind('hashchange', function() {
+              maybeScrollToHash();
+            });
+            // Scroll now too in case we had opened the page on a hash, but wait 1 ms because some browsers
+            // will try to do *their* initial scroll after running the onReady handler.
+            setTimeout(function() { maybeScrollToHash(); }, 1)
+          })
+        </script>
+
+    </body>
+</html>