You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@spark.apache.org by pw...@apache.org on 2014/04/22 06:58:15 UTC

[1/2] [SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs

Repository: spark
Updated Branches:
  refs/heads/master 04c37b6f7 -> fc7838470


http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/project/plugins.sbt
----------------------------------------------------------------------
diff --git a/project/plugins.sbt b/project/plugins.sbt
index c25a258..0cd16fd 100644
--- a/project/plugins.sbt
+++ b/project/plugins.sbt
@@ -23,3 +23,4 @@ addSbtPlugin("com.typesafe" % "sbt-mima-plugin" % "0.1.6")
 
 addSbtPlugin("com.alpinenow" % "junit_xml_listener" % "0.5.0")
 
+addSbtPlugin("com.eed3si9n" % "sbt-unidoc" % "0.3.0")

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/python/epydoc.conf
----------------------------------------------------------------------
diff --git a/python/epydoc.conf b/python/epydoc.conf
index 081ed21..b73860b 100644
--- a/python/epydoc.conf
+++ b/python/epydoc.conf
@@ -18,8 +18,8 @@
 #
 
 # Information about the project.
-name: PySpark
-url: http://spark-project.org
+name: Spark 1.0.0 Python API Docs
+url: http://spark.apache.org
 
 # The list of modules to document.  Modules can be named using
 # dotted names, module filenames, or package directory names.

[2/2] git commit: [SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs

Posted by pw...@apache.org.

[SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs

I used the sbt-unidoc plugin (https://github.com/sbt/sbt-unidoc) to create a unified Scaladoc of our public packages, and generate Javadocs as well. One limitation is that I haven't found an easy way to exclude packages in the Javadoc; there is a SBT task that identifies Java sources to run javadoc on, but it's been very difficult to modify it from outside to change what is set in the unidoc package. Some SBT-savvy people should help with this. The Javadoc site also lacks package-level descriptions and things like that, so we may want to look into that. We may decide not to post these right now if it's too limited compared to the Scala one.

Example of the built doc site: http://people.csail.mit.edu/matei/spark-unified-docs/

Author: Matei Zaharia <ma...@databricks.com>

This patch had conflicts when merged, resolved by
Committer: Patrick Wendell <pw...@gmail.com>

Closes #457 from mateiz/better-docs and squashes the following commits:

a63d4a3 [Matei Zaharia] Skip Java/Scala API docs for Python package
5ea1f43 [Matei Zaharia] Fix links to Java classes in Java guide, fix some JS for scrolling to anchors on page load
f05abc0 [Matei Zaharia] Don't include java.lang package names
995e992 [Matei Zaharia] Skip internal packages and class names with $ in JavaDoc
a14a93c [Matei Zaharia] typo
76ce64d [Matei Zaharia] Add groups to Javadoc index page, and a first package-info.java
ed6f994 [Matei Zaharia] Generate JavaDoc as well, add titles, update doc site to use unified docs
acb993d [Matei Zaharia] Add Unidoc plugin for the projects we want Unidoced


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc783847
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc783847
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc783847

Branch: refs/heads/master
Commit: fc7838470465474f777bd17791c1bb5f9c348521
Parents: 04c37b6
Author: Matei Zaharia <ma...@databricks.com>
Authored: Mon Apr 21 21:57:40 2014 -0700
Committer: Patrick Wendell <pw...@gmail.com>
Committed: Mon Apr 21 21:57:40 2014 -0700

----------------------------------------------------------------------
 .../java/org/apache/spark/package-info.java     |  23 ++++
 .../main/scala/org/apache/spark/package.scala   |   2 +-
 docs/_layouts/global.html                       | 105 ++++---------------
 docs/_plugins/copy_api_dirs.rb                  |  65 ++++++------
 docs/api.md                                     |  13 +--
 docs/configuration.md                           |   8 +-
 docs/graphx-programming-guide.md                |  62 +++++------
 docs/index.md                                   |  10 +-
 docs/java-programming-guide.md                  |  55 +++++-----
 docs/js/main.js                                 |  18 ++++
 docs/mllib-classification-regression.md         |  14 +--
 docs/mllib-clustering.md                        |   2 +-
 docs/mllib-collaborative-filtering.md           |   2 +-
 docs/mllib-guide.md                             |  10 +-
 docs/mllib-optimization.md                      |   8 +-
 docs/python-programming-guide.md                |   4 +-
 docs/quick-start.md                             |   6 +-
 docs/scala-programming-guide.md                 |  10 +-
 docs/sql-programming-guide.md                   |  20 ++--
 docs/streaming-custom-receivers.md              |   4 +-
 docs/streaming-programming-guide.md             |  56 +++++-----
 docs/tuning.md                                  |   4 +-
 project/SparkBuild.scala                        |  74 ++++++++++---
 project/plugins.sbt                             |   1 +
 python/epydoc.conf                              |   4 +-
 25 files changed, 296 insertions(+), 284 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/core/src/main/java/org/apache/spark/package-info.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/spark/package-info.java b/core/src/main/java/org/apache/spark/package-info.java
new file mode 100644
index 0000000..4426c7a
--- /dev/null
+++ b/core/src/main/java/org/apache/spark/package-info.java
@@ -0,0 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Core Spark classes in Scala. A few classes here, such as {@link org.apache.spark.Accumulator}
+ * and {@link org.apache.spark.storage.StorageLevel}, are also used in Java, but the
+ * {@link org.apache.spark.api.java} package contains the main Java API.
+ */
+package org.apache.spark;

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/core/src/main/scala/org/apache/spark/package.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/package.scala b/core/src/main/scala/org/apache/spark/package.scala
index 59bbb11..5cdbc30 100644
--- a/core/src/main/scala/org/apache/spark/package.scala
+++ b/core/src/main/scala/org/apache/spark/package.scala
@@ -30,7 +30,7 @@ package org.apache
  * type (e.g. RDD[(Int, Int)] through implicit conversions when you
  * `import org.apache.spark.SparkContext._`.
  *
- * Java programmers should reference the [[spark.api.java]] package
+ * Java programmers should reference the [[org.apache.spark.api.java]] package
  * for Spark programming APIs in Java.
  *
  * Classes and methods marked with <span class="experimental badge" style="float: none;">

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/_layouts/global.html
----------------------------------------------------------------------
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index 5d4dbb7..8b543de 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -76,32 +76,9 @@
                         <li class="dropdown">
                             <a href="#" class="dropdown-toggle" data-toggle="dropdown">API Docs<b class="caret"></b></a>
                             <ul class="dropdown-menu">
-                                <li><a href="api/core/index.html#org.apache.spark.package">Spark Core for Java/Scala</a></li>
-                                <li><a href="api/pyspark/index.html">Spark Core for Python</a></li>
-                                <li class="divider"></li>
-                                <li><a href="api/streaming/index.html#org.apache.spark.streaming.package">Spark Streaming</a></li>
-                                <li class="dropdown-submenu">
-                                    <a tabindex="-1" href="#">Spark SQL</a>
-                                    <ul class="dropdown-menu">
-                                        <li><a href="api/sql/core/org/apache/spark/sql/SQLContext.html">Spark SQL Core</a></li>
-                                        <li><a href="api/sql/hive/org/apache/spark/sql/hive/package.html">Hive Support</a></li>
-                                        <li><a href="api/sql/catalyst/org/apache/spark/sql/catalyst/package.html">Catalyst (Optimization)</a></li>
-                                    </ul>
-                                </li>
-                                <li><a href="api/mllib/index.html#org.apache.spark.mllib.package">MLlib (Machine Learning)</a></li>
-                                <li><a href="api/bagel/index.html#org.apache.spark.bagel.package">Bagel (Pregel on Spark)</a></li>
-                                <li><a href="api/graphx/index.html#org.apache.spark.graphx.package">GraphX (Graph Processing)</a></li>
-                                <li class="divider"></li>
-                                <li class="dropdown-submenu">
-                                    <a tabindex="-1" href="#">External Data Sources</a>
-                                    <ul class="dropdown-menu">
-                                        <li><a href="api/external/kafka/index.html#org.apache.spark.streaming.kafka.KafkaUtils$">Kafka</a></li>
-                                        <li><a href="api/external/flume/index.html#org.apache.spark.streaming.flume.FlumeUtils$">Flume</a></li>
-                                        <li><a href="api/external/twitter/index.html#org.apache.spark.streaming.twitter.TwitterUtils$">Twitter</a></li>
-                                        <li><a href="api/external/zeromq/index.html#org.apache.spark.streaming.zeromq.ZeroMQUtils$">ZeroMQ</a></li>
-                                        <li><a href="api/external/mqtt/index.html#org.apache.spark.streaming.mqtt.MQTTUtils$">MQTT</a></li>
-                                    </ul>
-                                </li>
+                                <li><a href="api/scala/index.html#org.apache.spark.package">Scaladoc</a></li>
+                                <li><a href="api/java/index.html">Javadoc</a></li>
+                                <li><a href="api/python/index.html">Python API</a></li>
                             </ul>
                         </li>
 
@@ -140,33 +117,6 @@
           <h1 class="title">{{ page.title }}</h1>
 
           {{ content }}
-            <!-- Main hero unit for a primary marketing message or call to action -->
-            <!--<div class="hero-unit">
-                <h1>Hello, world!</h1>
-                <p>This is a template for a simple marketing or informational website. It includes a large callout called the hero unit and three supporting pieces of content. Use it as a starting point to create something more unique.</p>
-                <p><a class="btn btn-primary btn-large">Learn more &raquo;</a></p>
-            </div>-->
-
-            <!-- Example row of columns -->
-            <!--<div class="row">
-                <div class="span4">
-                    <h2>Heading</h2>
-                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
-                    <p><a class="btn" href="#">View details &raquo;</a></p>
-                </div>
-                <div class="span4">
-                    <h2>Heading</h2>
-                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
-                    <p><a class="btn" href="#">View details &raquo;</a></p>
-               </div>
-                <div class="span4">
-                    <h2>Heading</h2>
-                    <p>Donec sed odio dui. Cras justo odio, dapibus ac facilisis in, egestas eget quam. Vestibulum id ligula porta felis euismod semper. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.</p>
-                    <p><a class="btn" href="#">View details &raquo;</a></p>
-                </div>
-            </div>
-
-            <hr>-->
 
         </div> <!-- /container -->
 
@@ -174,42 +124,23 @@
         <script src="js/vendor/bootstrap.min.js"></script>
         <script src="js/main.js"></script>
 
-        <!-- A script to fix internal hash links because we have an overlapping top bar.
-             Based on https://github.com/twitter/bootstrap/issues/193#issuecomment-2281510 -->
+        <!-- MathJax Section -->
+        <script type="text/x-mathjax-config">
+              MathJax.Hub.Config({
+                TeX: { equationNumbers: { autoNumber: "AMS" } }
+              });
+            </script>
+        <script type="text/javascript"
+         src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
         <script>
-          $(function() {
-            function maybeScrollToHash() {
-              if (window.location.hash && $(window.location.hash).length) {
-                var newTop = $(window.location.hash).offset().top - $('#topbar').height() - 5;
-                $(window).scrollTop(newTop);
-              }
+          MathJax.Hub.Config({
+            tex2jax: {
+              inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ],
+              displayMath: [ ["$$","$$"], ["\\[", "\\]"] ], 
+              processEscapes: true,
+              skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
             }
-            $(window).bind('hashchange', function() {
-              maybeScrollToHash();
-            });
-            // Scroll now too in case we had opened the page on a hash, but wait 1 ms because some browsers
-            // will try to do *their* initial scroll after running the onReady handler.
-            setTimeout(function() { maybeScrollToHash(); }, 1)
-          })
+          });
         </script>
-
     </body>
-    <!-- MathJax Section -->
-    <script type="text/x-mathjax-config">
-	  MathJax.Hub.Config({
-	    TeX: { equationNumbers: { autoNumber: "AMS" } }
-	  });
-	</script>
-    <script type="text/javascript"
-     src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
-    <script>
-      MathJax.Hub.Config({
-        tex2jax: {
-          inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ],
-          displayMath: [ ["$$","$$"], ["\\[", "\\]"] ], 
-          processEscapes: true,
-          skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
-        }
-      });
-    </script>
 </html>

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/_plugins/copy_api_dirs.rb
----------------------------------------------------------------------
diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb
index 05f0bd4..2dbbbf6 100644
--- a/docs/_plugins/copy_api_dirs.rb
+++ b/docs/_plugins/copy_api_dirs.rb
@@ -20,47 +20,48 @@ include FileUtils
 
 if not (ENV['SKIP_API'] == '1' or ENV['SKIP_SCALADOC'] == '1')
   # Build Scaladoc for Java/Scala
-  core_projects = ["core", "examples", "repl", "bagel", "graphx", "streaming", "mllib"]
-  external_projects = ["flume", "kafka", "mqtt", "twitter", "zeromq"]
-  sql_projects = ["catalyst", "core", "hive"]
 
-  projects = core_projects
-  projects = projects + external_projects.map { |project_name| "external/" + project_name }
-  projects = projects + sql_projects.map { |project_name| "sql/" + project_name }
-
-  puts "Moving to project root and building scaladoc."
+  puts "Moving to project root and building API docs."
   curr_dir = pwd
   cd("..")
 
-  puts "Running 'sbt/sbt doc hive/doc' from " + pwd + "; this may take a few minutes..."
-  puts `sbt/sbt doc hive/doc`
+  puts "Running 'sbt/sbt compile unidoc' from " + pwd + "; this may take a few minutes..."
+  puts `sbt/sbt compile unidoc`
 
   puts "Moving back into docs dir."
   cd("docs")
 
-  # Copy over the scaladoc from each project into the docs directory.
+  # Copy over the unified ScalaDoc for all projects to api/scala.
   # This directory will be copied over to _site when `jekyll` command is run.
-  projects.each do |project_name|
-    source = "../" + project_name + "/target/scala-2.10/api"
-    dest = "api/" + project_name
+  source = "../target/scala-2.10/unidoc"
+  dest = "api/scala"
+
+  puts "Making directory " + dest
+  mkdir_p dest
+
+  # From the rubydoc: cp_r('src', 'dest') makes src/dest, but this doesn't.
+  puts "cp -r " + source + "/. " + dest
+  cp_r(source + "/.", dest)
+
+  # Append custom JavaScript
+  js = File.readlines("./js/api-docs.js")
+  js_file = dest + "/lib/template.js"
+  File.open(js_file, 'a') { |f| f.write("\n" + js.join()) }
 
-    puts "making directory " + dest
-    mkdir_p dest
+  # Append custom CSS
+  css = File.readlines("./css/api-docs.css")
+  css_file = dest + "/lib/template.css"
+  File.open(css_file, 'a') { |f| f.write("\n" + css.join()) }
 
-    # From the rubydoc: cp_r('src', 'dest') makes src/dest, but this doesn't.
-    puts "cp -r " + source + "/. " + dest
-    cp_r(source + "/.", dest)
+  # Copy over the unified JavaDoc for all projects to api/java.
+  source = "../target/javaunidoc"
+  dest = "api/java"
 
-    # Append custom JavaScript
-    js = File.readlines("./js/api-docs.js")
-    js_file = dest + "/lib/template.js"
-    File.open(js_file, 'a') { |f| f.write("\n" + js.join()) }
+  puts "Making directory " + dest
+  mkdir_p dest
 
-    # Append custom CSS
-    css = File.readlines("./css/api-docs.css")
-    css_file = dest + "/lib/template.css"
-    File.open(css_file, 'a') { |f| f.write("\n" + css.join()) }
-  end
+  puts "cp -r " + source + "/. " + dest
+  cp_r(source + "/.", dest)
 
   # Build Epydoc for Python
   puts "Moving to python directory and building epydoc."
@@ -70,11 +71,11 @@ if not (ENV['SKIP_API'] == '1' or ENV['SKIP_SCALADOC'] == '1')
   puts "Moving back into docs dir."
   cd("../docs")
 
-  puts "echo making directory pyspark"
-  mkdir_p "pyspark"
+  puts "Making directory api/python"
+  mkdir_p "api/python"
 
-  puts "cp -r ../python/docs/. api/pyspark"
-  cp_r("../python/docs/.", "api/pyspark")
+  puts "cp -r ../python/docs/. api/python"
+  cp_r("../python/docs/.", "api/python")
 
   cd("..")
 end

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/api.md
----------------------------------------------------------------------
diff --git a/docs/api.md b/docs/api.md
index 91c8e51..0346038 100644
--- a/docs/api.md
+++ b/docs/api.md
@@ -1,13 +1,10 @@
 ---
 layout: global
-title: Spark API documentation (Scaladoc)
+title: Spark API Documentation
 ---
 
-Here you can find links to the Scaladoc generated for the Spark sbt subprojects.  If the following links don't work, try running `sbt/sbt doc` from the Spark project home directory.
+Here you can API docs for Spark and its submodules.
 
-- [Spark](api/core/index.html)
-- [Spark Examples](api/examples/index.html)
-- [Spark Streaming](api/streaming/index.html)
-- [Bagel](api/bagel/index.html)
-- [GraphX](api/graphx/index.html)
-- [PySpark](api/pyspark/index.html)
+- [Spark Scala API (Scaladoc)](api/scala/index.html)
+- [Spark Java API (Javadoc)](api/java/index.html)
+- [Spark Python API (Epydoc)](api/python/index.html)

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index 5a4abca..e7e1dd5 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -6,7 +6,7 @@ title: Spark Configuration
 Spark provides three locations to configure the system:
 
 * [Spark properties](#spark-properties) control most application parameters and can be set by passing
-  a [SparkConf](api/core/index.html#org.apache.spark.SparkConf) object to SparkContext, or through Java
+  a [SparkConf](api/scala/index.html#org.apache.spark.SparkConf) object to SparkContext, or through Java
   system properties.
 * [Environment variables](#environment-variables) can be used to set per-machine settings, such as
   the IP address, through the `conf/spark-env.sh` script on each node.
@@ -16,7 +16,7 @@ Spark provides three locations to configure the system:
 # Spark Properties
 
 Spark properties control most application settings and are configured separately for each application.
-The preferred way to set them is by passing a [SparkConf](api/core/index.html#org.apache.spark.SparkConf)
+The preferred way to set them is by passing a [SparkConf](api/scala/index.html#org.apache.spark.SparkConf)
 class to your SparkContext constructor.
 Alternatively, Spark will also load them from Java system properties, for compatibility with old versions
 of Spark.
@@ -53,7 +53,7 @@ there are at least five properties that you will commonly want to control:
     in serialized form. The default of Java serialization works with any Serializable Java object but is
     quite slow, so we recommend <a href="tuning.html">using <code>org.apache.spark.serializer.KryoSerializer</code>
     and configuring Kryo serialization</a> when speed is necessary. Can be any subclass of
-    <a href="api/core/index.html#org.apache.spark.serializer.Serializer"><code>org.apache.spark.Serializer</code></a>.
+    <a href="api/scala/index.html#org.apache.spark.serializer.Serializer"><code>org.apache.spark.Serializer</code></a>.
   </td>
 </tr>
 <tr>
@@ -62,7 +62,7 @@ there are at least five properties that you will commonly want to control:
   <td>
     If you use Kryo serialization, set this class to register your custom classes with Kryo.
     It should be set to a class that extends
-    <a href="api/core/index.html#org.apache.spark.serializer.KryoRegistrator"><code>KryoRegistrator</code></a>.
+    <a href="api/scala/index.html#org.apache.spark.serializer.KryoRegistrator"><code>KryoRegistrator</code></a>.
     See the <a href="tuning.html#data-serialization">tuning guide</a> for more details.
   </td>
 </tr>

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/graphx-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 1238e3e..07be8ba 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -17,7 +17,7 @@ title: GraphX Programming Guide
 # Overview
 
 GraphX is the new (alpha) Spark API for graphs and graph-parallel computation. At a high-level,
-GraphX extends the Spark [RDD](api/core/index.html#org.apache.spark.rdd.RDD) by introducing the
+GraphX extends the Spark [RDD](api/scala/index.html#org.apache.spark.rdd.RDD) by introducing the
 [Resilient Distributed Property Graph](#property_graph): a directed multigraph with properties
 attached to each vertex and edge.  To support graph computation, GraphX exposes a set of fundamental
 operators (e.g., [subgraph](#structural_operators), [joinVertices](#join_operators), and
@@ -82,7 +82,7 @@ Prior to the release of GraphX, graph computation in Spark was expressed using B
 implementation of Pregel.  GraphX improves upon Bagel by exposing a richer property graph API, a
 more streamlined version of the Pregel abstraction, and system optimizations to improve performance
 and reduce memory overhead.  While we plan to eventually deprecate Bagel, we will continue to
-support the [Bagel API](api/bagel/index.html#org.apache.spark.bagel.package) and
+support the [Bagel API](api/scala/index.html#org.apache.spark.bagel.package) and
 [Bagel programming guide](bagel-programming-guide.html). However, we encourage Bagel users to
 explore the new GraphX API and comment on issues that may complicate the transition from Bagel.
 
@@ -103,7 +103,7 @@ getting started with Spark refer to the [Spark Quick Start Guide](quick-start.ht
 # The Property Graph
 <a name="property_graph"></a>
 
-The [property graph](api/graphx/index.html#org.apache.spark.graphx.Graph) is a directed multigraph
+The [property graph](api/scala/index.html#org.apache.spark.graphx.Graph) is a directed multigraph
 with user defined objects attached to each vertex and edge.  A directed multigraph is a directed
 graph with potentially multiple parallel edges sharing the same source and destination vertex.  The
 ability to support parallel edges simplifies modeling scenarios where there can be multiple
@@ -179,7 +179,7 @@ val userGraph: Graph[(String, String), String]
 There are numerous ways to construct a property graph from raw files, RDDs, and even synthetic
 generators and these are discussed in more detail in the section on
 [graph builders](#graph_builders).  Probably the most general method is to use the
-[Graph object](api/graphx/index.html#org.apache.spark.graphx.Graph$).  For example the following
+[Graph object](api/scala/index.html#org.apache.spark.graphx.Graph$).  For example the following
 code constructs a graph from a collection of RDDs:
 
 {% highlight scala %}
@@ -203,7 +203,7 @@ In the above example we make use of the [`Edge`][Edge] case class. Edges have a
 `dstId` corresponding to the source and destination vertex identifiers. In addition, the `Edge`
 class has an `attr` member which stores the edge property.
 
-[Edge]: api/graphx/index.html#org.apache.spark.graphx.Edge
+[Edge]: api/scala/index.html#org.apache.spark.graphx.Edge
 
 We can deconstruct a graph into the respective vertex and edge views by using the `graph.vertices`
 and `graph.edges` members respectively.
@@ -229,7 +229,7 @@ The triplet view logically joins the vertex and edge properties yielding an
 `RDD[EdgeTriplet[VD, ED]]` containing instances of the [`EdgeTriplet`][EdgeTriplet] class. This
 *join* can be expressed in the following SQL expression:
 
-[EdgeTriplet]: api/graphx/index.html#org.apache.spark.graphx.EdgeTriplet
+[EdgeTriplet]: api/scala/index.html#org.apache.spark.graphx.EdgeTriplet
 
 {% highlight sql %}
 SELECT src.id, dst.id, src.attr, e.attr, dst.attr
@@ -270,8 +270,8 @@ core operators are defined in [`GraphOps`][GraphOps].  However, thanks to Scala
 operators in `GraphOps` are automatically available as members of `Graph`.  For example, we can
 compute the in-degree of each vertex (defined in `GraphOps`) by the following:
 
-[Graph]: api/graphx/index.html#org.apache.spark.graphx.Graph
-[GraphOps]: api/graphx/index.html#org.apache.spark.graphx.GraphOps
+[Graph]: api/scala/index.html#org.apache.spark.graphx.Graph
+[GraphOps]: api/scala/index.html#org.apache.spark.graphx.GraphOps
 
 {% highlight scala %}
 val graph: Graph[(String, String), String]
@@ -382,7 +382,7 @@ val newGraph = Graph(newVertices, graph.edges)
 val newGraph = graph.mapVertices((id, attr) => mapUdf(id, attr))
 {% endhighlight %}
 
-[Graph.mapVertices]: api/graphx/index.html#org.apache.spark.graphx.Graph@mapVertices[VD2]((VertexId,VD)⇒VD2)(ClassTag[VD2]):Graph[VD2,ED]
+[Graph.mapVertices]: api/scala/index.html#org.apache.spark.graphx.Graph@mapVertices[VD2]((VertexId,VD)⇒VD2)(ClassTag[VD2]):Graph[VD2,ED]
 
 These operators are often used to initialize the graph for a particular computation or project away
 unnecessary properties.  For example, given a graph with the out-degrees as the vertex properties
@@ -419,7 +419,7 @@ This can be useful when, for example, trying to compute the inverse PageRank.  B
 operation does not modify vertex or edge properties or change the number of edges, it can be
 implemented efficiently without data-movement or duplication.
 
-[Graph.reverse]: api/graphx/index.html#org.apache.spark.graphx.Graph@reverse:Graph[VD,ED]
+[Graph.reverse]: api/scala/index.html#org.apache.spark.graphx.Graph@reverse:Graph[VD,ED]
 
 The [`subgraph`][Graph.subgraph] operator takes vertex and edge predicates and returns the graph
 containing only the vertices that satisfy the vertex predicate (evaluate to true) and edges that
@@ -427,7 +427,7 @@ satisfy the edge predicate *and connect vertices that satisfy the vertex predica
 operator can be used in number of situations to restrict the graph to the vertices and edges of
 interest or eliminate broken links. For example in the following code we remove broken links:
 
-[Graph.subgraph]: api/graphx/index.html#org.apache.spark.graphx.Graph@subgraph((EdgeTriplet[VD,ED])⇒Boolean,(VertexId,VD)⇒Boolean):Graph[VD,ED]
+[Graph.subgraph]: api/scala/index.html#org.apache.spark.graphx.Graph@subgraph((EdgeTriplet[VD,ED])⇒Boolean,(VertexId,VD)⇒Boolean):Graph[VD,ED]
 
 {% highlight scala %}
 // Create an RDD for the vertices
@@ -467,7 +467,7 @@ vertices and edges that are also found in the input graph.  This can be used in
 example, we might run connected components using the graph with missing vertices and then restrict
 the answer to the valid subgraph.
 
-[Graph.mask]: api/graphx/index.html#org.apache.spark.graphx.Graph@mask[VD2,ED2](Graph[VD2,ED2])(ClassTag[VD2],ClassTag[ED2]):Graph[VD,ED]
+[Graph.mask]: api/scala/index.html#org.apache.spark.graphx.Graph@mask[VD2,ED2](Graph[VD2,ED2])(ClassTag[VD2],ClassTag[ED2]):Graph[VD,ED]
 
 {% highlight scala %}
 // Run Connected Components
@@ -482,7 +482,7 @@ The [`groupEdges`][Graph.groupEdges] operator merges parallel edges (i.e., dupli
 pairs of vertices) in the multigraph.  In many numerical applications, parallel edges can be *added*
 (their weights combined) into a single edge thereby reducing the size of the graph.
 
-[Graph.groupEdges]: api/graphx/index.html#org.apache.spark.graphx.Graph@groupEdges((ED,ED)⇒ED):Graph[VD,ED]
+[Graph.groupEdges]: api/scala/index.html#org.apache.spark.graphx.Graph@groupEdges((ED,ED)⇒ED):Graph[VD,ED]
 
 ## Join Operators
 <a name="join_operators"></a>
@@ -506,7 +506,7 @@ returns a new graph with the vertex properties obtained by applying the user def
 to the result of the joined vertices.  Vertices without a matching value in the RDD retain their
 original value.
 
-[GraphOps.joinVertices]: api/graphx/index.html#org.apache.spark.graphx.GraphOps@joinVertices[U](RDD[(VertexId,U)])((VertexId,VD,U)⇒VD)(ClassTag[U]):Graph[VD,ED]
+[GraphOps.joinVertices]: api/scala/index.html#org.apache.spark.graphx.GraphOps@joinVertices[U](RDD[(VertexId,U)])((VertexId,VD,U)⇒VD)(ClassTag[U]):Graph[VD,ED]
 
 > Note that if the RDD contains more than one value for a given vertex only one will be used.   It
 > is therefore recommended that the input RDD be first made unique using the following which will
@@ -525,7 +525,7 @@ property type.  Because not all vertices may have a matching value in the input
 function takes an `Option` type.  For example, we can setup a graph for PageRank by initializing
 vertex properties with their `outDegree`.
 
-[Graph.outerJoinVertices]: api/graphx/index.html#org.apache.spark.graphx.Graph@outerJoinVertices[U,VD2](RDD[(VertexId,U)])((VertexId,VD,Option[U])⇒VD2)(ClassTag[U],ClassTag[VD2]):Graph[VD2,ED]
+[Graph.outerJoinVertices]: api/scala/index.html#org.apache.spark.graphx.Graph@outerJoinVertices[U,VD2](RDD[(VertexId,U)])((VertexId,VD,Option[U])⇒VD2)(ClassTag[U],ClassTag[VD2]):Graph[VD2,ED]
 
 
 {% highlight scala %}
@@ -559,7 +559,7 @@ PageRank Value, shortest path to the source, and smallest reachable vertex id).
 ### Map Reduce Triplets (mapReduceTriplets)
 <a name="mrTriplets"></a>
 
-[Graph.mapReduceTriplets]: api/graphx/index.html#org.apache.spark.graphx.Graph@mapReduceTriplets[A](mapFunc:org.apache.spark.graphx.EdgeTriplet[VD,ED]=&gt;Iterator[(org.apache.spark.graphx.VertexId,A)],reduceFunc:(A,A)=&gt;A,activeSetOpt:Option[(org.apache.spark.graphx.VertexRDD[_],org.apache.spark.graphx.EdgeDirection)])(implicitevidence$10:scala.reflect.ClassTag[A]):org.apache.spark.graphx.VertexRDD[A]
+[Graph.mapReduceTriplets]: api/scala/index.html#org.apache.spark.graphx.Graph@mapReduceTriplets[A](mapFunc:org.apache.spark.graphx.EdgeTriplet[VD,ED]=&gt;Iterator[(org.apache.spark.graphx.VertexId,A)],reduceFunc:(A,A)=&gt;A,activeSetOpt:Option[(org.apache.spark.graphx.VertexRDD[_],org.apache.spark.graphx.EdgeDirection)])(implicitevidence$10:scala.reflect.ClassTag[A]):org.apache.spark.graphx.VertexRDD[A]
 
 The core (heavily optimized) aggregation primitive in GraphX is the
 [`mapReduceTriplets`][Graph.mapReduceTriplets] operator:
@@ -665,8 +665,8 @@ attributes at each vertex. This can be easily accomplished using the
 [`collectNeighborIds`][GraphOps.collectNeighborIds] and the
 [`collectNeighbors`][GraphOps.collectNeighbors] operators.
 
-[GraphOps.collectNeighborIds]: api/graphx/index.html#org.apache.spark.graphx.GraphOps@collectNeighborIds(EdgeDirection):VertexRDD[Array[VertexId]]
-[GraphOps.collectNeighbors]: api/graphx/index.html#org.apache.spark.graphx.GraphOps@collectNeighbors(EdgeDirection):VertexRDD[Array[(VertexId,VD)]]
+[GraphOps.collectNeighborIds]: api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighborIds(EdgeDirection):VertexRDD[Array[VertexId]]
+[GraphOps.collectNeighbors]: api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighbors(EdgeDirection):VertexRDD[Array[(VertexId,VD)]]
 
 
 {% highlight scala %}
@@ -685,7 +685,7 @@ class GraphOps[VD, ED] {
 In Spark, RDDs are not persisted in memory by default. To avoid recomputation, they must be explicitly cached when using them multiple times (see the [Spark Programming Guide][RDD Persistence]). Graphs in GraphX behave the same way. **When using a graph multiple times, make sure to call [`Graph.cache()`][Graph.cache] on it first.**
 
 [RDD Persistence]: scala-programming-guide.html#rdd-persistence
-[Graph.cache]: api/graphx/index.html#org.apache.spark.graphx.Graph@cache():Graph[VD,ED]
+[Graph.cache]: api/scala/index.html#org.apache.spark.graphx.Graph@cache():Graph[VD,ED]
 
 In iterative computations, *uncaching* may also be necessary for best performance. By default, cached RDDs and graphs will remain in memory until memory pressure forces them to be evicted in LRU order. For iterative computation, intermediate results from previous iterations will fill up the cache. Though they will eventually be evicted, the unnecessary data stored in memory will slow down garbage collection. It would be more efficient to uncache intermediate results as soon as they are no longer necessary. This involves materializing (caching and forcing) a graph or RDD every iteration, uncaching all other datasets, and only using the materialized dataset in future iterations. However, because graphs are composed of multiple RDDs, it can be difficult to unpersist them correctly. **For iterative computation we recommend using the Pregel API, which correctly unpersists intermediate results.**
 
@@ -716,7 +716,7 @@ messages remaining.
 The following is the type signature of the [Pregel operator][GraphOps.pregel] as well as a *sketch*
 of its implementation (note calls to graph.cache have been removed):
 
-[GraphOps.pregel]: api/graphx/index.html#org.apache.spark.graphx.GraphOps@pregel[A](A,Int,EdgeDirection)((VertexId,VD,A)⇒VD,(EdgeTriplet[VD,ED])⇒Iterator[(VertexId,A)],(A,A)⇒A)(ClassTag[A]):Graph[VD,ED]
+[GraphOps.pregel]: api/scala/index.html#org.apache.spark.graphx.GraphOps@pregel[A](A,Int,EdgeDirection)((VertexId,VD,A)⇒VD,(EdgeTriplet[VD,ED])⇒Iterator[(VertexId,A)],(A,A)⇒A)(ClassTag[A]):Graph[VD,ED]
 
 {% highlight scala %}
 class GraphOps[VD, ED] {
@@ -840,12 +840,12 @@ object Graph {
 
 [`Graph.fromEdgeTuples`][Graph.fromEdgeTuples] allows creating a graph from only an RDD of edge tuples, assigning the edges the value 1, and automatically creating any vertices mentioned by edges and assigning them the default value. It also supports deduplicating the edges; to deduplicate, pass `Some` of a [`PartitionStrategy`][PartitionStrategy] as the `uniqueEdges` parameter (for example, `uniqueEdges = Some(PartitionStrategy.RandomVertexCut)`). A partition strategy is necessary to colocate identical edges on the same partition so they can be deduplicated.
 
-[PartitionStrategy]: api/graphx/index.html#org.apache.spark.graphx.PartitionStrategy$
+[PartitionStrategy]: api/scala/index.html#org.apache.spark.graphx.PartitionStrategy$
 
-[GraphLoader.edgeListFile]: api/graphx/index.html#org.apache.spark.graphx.GraphLoader$@edgeListFile(SparkContext,String,Boolean,Int):Graph[Int,Int]
-[Graph.apply]: api/graphx/index.html#org.apache.spark.graphx.Graph$@apply[VD,ED](RDD[(VertexId,VD)],RDD[Edge[ED]],VD)(ClassTag[VD],ClassTag[ED]):Graph[VD,ED]
-[Graph.fromEdgeTuples]: api/graphx/index.html#org.apache.spark.graphx.Graph$@fromEdgeTuples[VD](RDD[(VertexId,VertexId)],VD,Option[PartitionStrategy])(ClassTag[VD]):Graph[VD,Int]
-[Graph.fromEdges]: api/graphx/index.html#org.apache.spark.graphx.Graph$@fromEdges[VD,ED](RDD[Edge[ED]],VD)(ClassTag[VD],ClassTag[ED]):Graph[VD,ED]
+[GraphLoader.edgeListFile]: api/scala/index.html#org.apache.spark.graphx.GraphLoader$@edgeListFile(SparkContext,String,Boolean,Int):Graph[Int,Int]
+[Graph.apply]: api/scala/index.html#org.apache.spark.graphx.Graph$@apply[VD,ED](RDD[(VertexId,VD)],RDD[Edge[ED]],VD)(ClassTag[VD],ClassTag[ED]):Graph[VD,ED]
+[Graph.fromEdgeTuples]: api/scala/index.html#org.apache.spark.graphx.Graph$@fromEdgeTuples[VD](RDD[(VertexId,VertexId)],VD,Option[PartitionStrategy])(ClassTag[VD]):Graph[VD,Int]
+[Graph.fromEdges]: api/scala/index.html#org.apache.spark.graphx.Graph$@fromEdges[VD,ED](RDD[Edge[ED]],VD)(ClassTag[VD],ClassTag[ED]):Graph[VD,ED]
 
 # Vertex and Edge RDDs
 <a name="vertex_and_edge_rdds"></a>
@@ -913,7 +913,7 @@ of the various partitioning strategies defined in [`PartitionStrategy`][Partitio
 each partition, edge attributes and adjacency structure, are stored separately enabling maximum
 reuse when changing attribute values.
 
-[PartitionStrategy]: api/graphx/index.html#org.apache.spark.graphx.PartitionStrategy
+[PartitionStrategy]: api/scala/index.html#org.apache.spark.graphx.PartitionStrategy
 
 The three additional functions exposed by the `EdgeRDD` are:
 {% highlight scala %}
@@ -952,7 +952,7 @@ the [`Graph.partitionBy`][Graph.partitionBy] operator.  The default partitioning
 the initial partitioning of the edges as provided on graph construction.  However, users can easily
 switch to 2D-partitioning or other heuristics included in GraphX.
 
-[Graph.partitionBy]: api/graphx/index.html#org.apache.spark.graphx.Graph$@partitionBy(partitionStrategy:org.apache.spark.graphx.PartitionStrategy):org.apache.spark.graphx.Graph[VD,ED]
+[Graph.partitionBy]: api/scala/index.html#org.apache.spark.graphx.Graph$@partitionBy(partitionStrategy:org.apache.spark.graphx.PartitionStrategy):org.apache.spark.graphx.Graph[VD,ED]
 
 <p style="text-align: center;">
   <img src="img/vertex_routing_edge_tables.png"
@@ -983,7 +983,7 @@ GraphX comes with static and dynamic implementations of PageRank as methods on t
 
 GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of relationships between users is given in `graphx/data/followers.txt`. We compute the PageRank of each user as follows:
 
-[PageRank]: api/graphx/index.html#org.apache.spark.graphx.lib.PageRank$
+[PageRank]: api/scala/index.html#org.apache.spark.graphx.lib.PageRank$
 
 {% highlight scala %}
 // Load the edges as a graph
@@ -1006,7 +1006,7 @@ println(ranksByUsername.collect().mkString("\n"))
 
 The connected components algorithm labels each connected component of the graph with the ID of its lowest-numbered vertex. For example, in a social network, connected components can approximate clusters. GraphX contains an implementation of the algorithm in the [`ConnectedComponents` object][ConnectedComponents], and we compute the connected components of the example social network dataset from the [PageRank section](#pagerank) as follows:
 
-[ConnectedComponents]: api/graphx/index.html#org.apache.spark.graphx.lib.ConnectedComponents$
+[ConnectedComponents]: api/scala/index.html#org.apache.spark.graphx.lib.ConnectedComponents$
 
 {% highlight scala %}
 // Load the graph as in the PageRank example
@@ -1029,8 +1029,8 @@ println(ccByUsername.collect().mkString("\n"))
 
 A vertex is part of a triangle when it has two adjacent vertices with an edge between them. GraphX implements a triangle counting algorithm in the [`TriangleCount` object][TriangleCount] that determines the number of triangles passing through each vertex, providing a measure of clustering. We compute the triangle count of the social network dataset from the [PageRank section](#pagerank). *Note that `TriangleCount` requires the edges to be in canonical orientation (`srcId < dstId`) and the graph to be partitioned using [`Graph.partitionBy`][Graph.partitionBy].*
 
-[TriangleCount]: api/graphx/index.html#org.apache.spark.graphx.lib.TriangleCount$
-[Graph.partitionBy]: api/graphx/index.html#org.apache.spark.graphx.Graph@partitionBy(PartitionStrategy):Graph[VD,ED]
+[TriangleCount]: api/scala/index.html#org.apache.spark.graphx.lib.TriangleCount$
+[Graph.partitionBy]: api/scala/index.html#org.apache.spark.graphx.Graph@partitionBy(PartitionStrategy):Graph[VD,ED]
 
 {% highlight scala %}
 // Load the edges in canonical order and partition the graph for triangle count

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
index 89ec5b0..6fc9a4f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -83,13 +83,9 @@ Note that on Windows, you need to set the environment variables on separate line
 
 **API Docs:**
 
-* [Spark for Java/Scala (Scaladoc)](api/core/index.html)
-* [Spark for Python (Epydoc)](api/pyspark/index.html)
-* [Spark Streaming for Java/Scala (Scaladoc)](api/streaming/index.html)
-* [MLlib (Machine Learning) for Java/Scala (Scaladoc)](api/mllib/index.html)
-* [Bagel (Pregel on Spark) for Scala (Scaladoc)](api/bagel/index.html)
-* [GraphX (Graphs on Spark) for Scala (Scaladoc)](api/graphx/index.html)
-
+* [Spark Scala API (Scaladoc)](api/scala/index.html#org.apache.spark.package)
+* [Spark Java API (Javadoc)](api/java/index.html)
+* [Spark Python API (Epydoc)](api/python/index.html)
 
 **Deployment guides:**
 

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/java-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/java-programming-guide.md b/docs/java-programming-guide.md
index 6632360..07c8512 100644
--- a/docs/java-programming-guide.md
+++ b/docs/java-programming-guide.md
@@ -10,9 +10,9 @@ easy to follow even if you don't know Scala.
 This guide will show how to use the Spark features described there in Java.
 
 The Spark Java API is defined in the
-[`org.apache.spark.api.java`](api/core/index.html#org.apache.spark.api.java.package) package, and includes
-a [`JavaSparkContext`](api/core/index.html#org.apache.spark.api.java.JavaSparkContext) for
-initializing Spark and [`JavaRDD`](api/core/index.html#org.apache.spark.api.java.JavaRDD) classes,
+[`org.apache.spark.api.java`](api/java/index.html?org/apache/spark/api/java/package-summary.html) package, and includes
+a [`JavaSparkContext`](api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html) for
+initializing Spark and [`JavaRDD`](api/java/index.html?org/apache/spark/api/java/JavaRDD.html) classes,
 which support the same methods as their Scala counterparts but take Java functions and return
 Java data and collection types. The main differences have to do with passing functions to RDD
 operations (e.g. map) and handling RDDs of different types, as discussed next.
@@ -23,19 +23,18 @@ There are a few key differences between the Java and Scala APIs:
 
 * Java does not support anonymous or first-class functions, so functions are passed
   using anonymous classes that implement the
-  [`org.apache.spark.api.java.function.Function`](api/core/index.html#org.apache.spark.api.java.function.Function),
-  [`Function2`](api/core/index.html#org.apache.spark.api.java.function.Function2), etc.
+  [`org.apache.spark.api.java.function.Function`](api/java/index.html?org/apache/spark/api/java/function/Function.html),
+  [`Function2`](api/java/index.html?org/apache/spark/api/java/function/Function2.html), etc.
   interfaces.
 * To maintain type safety, the Java API defines specialized Function and RDD
   classes for key-value pairs and doubles. For example, 
-  [`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD)
+  [`JavaPairRDD`](api/java/index.html?org/apache/spark/api/java/JavaPairRDD.html)
   stores key-value pairs.
-* Some methods are defined on the basis of the passed anonymous function's 
-  (a.k.a lambda expression) return type, 
-  for example mapToPair(...) or flatMapToPair returns
-  [`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD),
-  similarly mapToDouble and flatMapToDouble returns
-  [`JavaDoubleRDD`](api/core/index.html#org.apache.spark.api.java.JavaDoubleRDD).
+* Some methods are defined on the basis of the passed function's return type.
+  For example `mapToPair()` returns
+  [`JavaPairRDD`](api/java/index.html?org/apache/spark/api/java/JavaPairRDD.html),
+  and `mapToDouble()` returns
+  [`JavaDoubleRDD`](api/java/index.html?org/apache/spark/api/java/JavaDoubleRDD.html).
 * RDD methods like `collect()` and `countByKey()` return Java collections types,
   such as `java.util.List` and `java.util.Map`.
 * Key-value pairs, which are simply written as `(key, value)` in Scala, are represented
@@ -50,8 +49,8 @@ In the Scala API, these methods are automatically added using Scala's
 [implicit conversions](http://www.scala-lang.org/node/130) mechanism.
 
 In the Java API, the extra methods are defined in the
-[`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD)
-and [`JavaDoubleRDD`](api/core/index.html#org.apache.spark.api.java.JavaDoubleRDD)
+[`JavaPairRDD`](api/java/index.html?org/apache/spark/api/java/JavaPairRDD.html)
+and [`JavaDoubleRDD`](api/java/index.html?org/apache/spark/api/java/JavaDoubleRDD.html)
 classes.  RDD methods like `map` are overloaded by specialized `PairFunction`
 and `DoubleFunction` classes, allowing them to return RDDs of the appropriate
 types.  Common methods like `filter` and `sample` are implemented by
@@ -61,8 +60,9 @@ framework](http://docs.scala-lang.org/overviews/core/architecture-of-scala-colle
 
 ## Function Interfaces
 
-The following table lists the function interfaces used by the Java API.  Each
-interface has a single abstract method, `call()`, that must be implemented.
+The following table lists the function interfaces used by the Java API, located in the
+[`org.apache.spark.api.java.function`](api/java/index.html?org/apache/spark/api/java/function/package-summary.html)
+package. Each interface has a single abstract method, `call()`.
 
 <table class="table">
 <tr><th>Class</th><th>Function Type</th></tr>
@@ -81,7 +81,7 @@ interface has a single abstract method, `call()`, that must be implemented.
 ## Storage Levels
 
 RDD [storage level](scala-programming-guide.html#rdd-persistence) constants, such as `MEMORY_AND_DISK`, are
-declared in the [org.apache.spark.api.java.StorageLevels](api/core/index.html#org.apache.spark.api.java.StorageLevels) class. To
+declared in the [org.apache.spark.api.java.StorageLevels](api/java/index.html?org/apache/spark/api/java/StorageLevels.html) class. To
 define your own storage level, you can use StorageLevels.create(...). 
 
 # Other Features
@@ -101,11 +101,11 @@ the following changes:
   classes to interfaces. This means that concrete implementations of these 
   `Function` classes will need to use `implements` rather than `extends`.
 * Certain transformation functions now have multiple versions depending
-  on the return type. In Spark core, the map functions (map, flatMap,
-  mapPartitons) have type-specific versions, e.g. 
-  [`mapToPair`](api/core/index.html#org.apache.spark.api.java.JavaRDD@mapToPair[K2,V2](f:org.apache.spark.api.java.function.PairFunction[T,K2,V2]):org.apache.spark.api.java.JavaPairRDD[K2,V2])
-  and [`mapToDouble`](api/core/index.html#org.apache.spark.api.java.JavaRDD@mapToDouble[R](f:org.apache.spark.api.java.function.DoubleFunction[T]):org.apache.spark.api.java.JavaDoubleRDD).
-  Spark Streaming also uses the same approach, e.g. [`transformToPair`](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaDStream@transformToPair[K2,V2](transformFunc:org.apache.spark.api.java.function.Function[R,org.apache.spark.api.java.JavaPairRDD[K2,V2]]):org.apache.spark.streaming.api.java.JavaPairDStream[K2,V2]).
+  on the return type. In Spark core, the map functions (`map`, `flatMap`, and
+  `mapPartitons`) have type-specific versions, e.g. 
+  [`mapToPair`](api/java/org/apache/spark/api/java/JavaRDDLike.html#mapToPair(org.apache.spark.api.java.function.PairFunction))
+  and [`mapToDouble`](api/java/org/apache/spark/api/java/JavaRDDLike.html#mapToDouble(org.apache.spark.api.java.function.DoubleFunction)).
+  Spark Streaming also uses the same approach, e.g. [`transformToPair`](api/java/org/apache/spark/streaming/api/java/JavaDStreamLike.html#transformToPair(org.apache.spark.api.java.function.Function)).
 
 # Example
 
@@ -205,16 +205,9 @@ JavaPairRDD<String, Integer> counts = lines.flatMapToPair(
 There is no performance difference between these approaches; the choice is
 just a matter of style.
 
-# Javadoc
-
-We currently provide documentation for the Java API as Scaladoc, in the
-[`org.apache.spark.api.java` package](api/core/index.html#org.apache.spark.api.java.package), because
-some of the classes are implemented in Scala. It is important to note that the types and function
-definitions show Scala syntax (for example, `def reduce(func: Function2[T, T]): T` instead of
-`T reduce(Function2<T, T> func)`). In addition, the Scala `trait` modifier is used for Java
-interface classes. We hope to generate documentation with Java-style syntax in the future to
-avoid these quirks.
+# API Docs
 
+[API documentation](api/java/index.html) for Spark in Java is available in Javadoc format.
 
 # Where to Go from Here
 

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/js/main.js
----------------------------------------------------------------------
diff --git a/docs/js/main.js b/docs/js/main.js
index 0bd2286..5905546 100755
--- a/docs/js/main.js
+++ b/docs/js/main.js
@@ -73,8 +73,26 @@ function viewSolution() {
   });
 }
 
+// A script to fix internal hash links because we have an overlapping top bar.
+// Based on https://github.com/twitter/bootstrap/issues/193#issuecomment-2281510
+function maybeScrollToHash() {
+  console.log("HERE");
+  if (window.location.hash && $(window.location.hash).length) {
+    console.log("HERE2", $(window.location.hash), $(window.location.hash).offset().top);
+    var newTop = $(window.location.hash).offset().top - 57;
+    $(window).scrollTop(newTop);
+  }
+}
 
 $(function() {
   codeTabs();
   viewSolution();
+
+  $(window).bind('hashchange', function() {
+    maybeScrollToHash();
+  });
+
+  // Scroll now too in case we had opened the page on a hash, but wait a bit because some browsers
+  // will try to do *their* initial scroll after running the onReady handler.
+  $(window).load(function() { setTimeout(function() { maybeScrollToHash(); }, 25); }); 
 });

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/mllib-classification-regression.md
----------------------------------------------------------------------
diff --git a/docs/mllib-classification-regression.md b/docs/mllib-classification-regression.md
index 2c42f60..2e0fa09 100644
--- a/docs/mllib-classification-regression.md
+++ b/docs/mllib-classification-regression.md
@@ -316,26 +316,26 @@ For each of them, we support all 3 possible regularizations (none, L1 or L2).
 
 Available algorithms for binary classification:
 
-* [SVMWithSGD](api/mllib/index.html#org.apache.spark.mllib.classification.SVMWithSGD)
-* [LogisticRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithSGD)
+* [SVMWithSGD](api/scala/index.html#org.apache.spark.mllib.classification.SVMWithSGD)
+* [LogisticRegressionWithSGD](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithSGD)
 
 Available algorithms for linear regression: 
 
-* [LinearRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.regression.LinearRegressionWithSGD)
-* [RidgeRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.regression.RidgeRegressionWithSGD)
-* [LassoWithSGD](api/mllib/index.html#org.apache.spark.mllib.regression.LassoWithSGD)
+* [LinearRegressionWithSGD](api/scala/index.html#org.apache.spark.mllib.regression.LinearRegressionWithSGD)
+* [RidgeRegressionWithSGD](api/scala/index.html#org.apache.spark.mllib.regression.RidgeRegressionWithSGD)
+* [LassoWithSGD](api/scala/index.html#org.apache.spark.mllib.regression.LassoWithSGD)
 
 Behind the scenes, all above methods use the SGD implementation from the
 gradient descent primitive in MLlib, see the 
 <a href="mllib-optimization.html">optimization</a> part:
 
-* [GradientDescent](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent)
+* [GradientDescent](api/scala/index.html#org.apache.spark.mllib.optimization.GradientDescent)
 
 #### Tree-based Methods
 
 The decision tree algorithm supports binary classification and regression:
 
-* [DecisionTee](api/mllib/index.html#org.apache.spark.mllib.tree.DecisionTree)
+* [DecisionTee](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree)
 
 
 # Usage in Scala

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/mllib-clustering.md
----------------------------------------------------------------------
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 50a8671..0359c67 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -33,7 +33,7 @@ a given dataset, the algorithm returns the best clustering result).
 
 Available algorithms for clustering: 
 
-* [KMeans](api/mllib/index.html#org.apache.spark.mllib.clustering.KMeans)
+* [KMeans](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans)
 
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/mllib-collaborative-filtering.md
----------------------------------------------------------------------
diff --git a/docs/mllib-collaborative-filtering.md b/docs/mllib-collaborative-filtering.md
index aa22f67..2f1f5f3 100644
--- a/docs/mllib-collaborative-filtering.md
+++ b/docs/mllib-collaborative-filtering.md
@@ -42,7 +42,7 @@ for an item.
 
 Available algorithms for collaborative filtering: 
 
-* [ALS](api/mllib/index.html#org.apache.spark.mllib.recommendation.ALS)
+* [ALS](api/scala/index.html#org.apache.spark.mllib.recommendation.ALS)
 
 
 # Usage in Scala

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/mllib-guide.md
----------------------------------------------------------------------
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 4236b0c..0963a99 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -36,15 +36,15 @@ The following links provide a detailed explanation of the methods and usage exam
 # Data Types
 
 Most MLlib algorithms operate on RDDs containing vectors. In Java and Scala, the
-[Vector](api/mllib/index.html#org.apache.spark.mllib.linalg.Vector) class is used to
+[Vector](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) class is used to
 represent vectors. You can create either dense or sparse vectors using the
-[Vectors](api/mllib/index.html#org.apache.spark.mllib.linalg.Vectors$) factory.
+[Vectors](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) factory.
 
 In Python, MLlib can take the following vector types:
 
 * [NumPy](http://www.numpy.org) arrays
 * Standard Python lists (e.g. `[1, 2, 3]`)
-* The MLlib [SparseVector](api/pyspark/pyspark.mllib.linalg.SparseVector-class.html) class
+* The MLlib [SparseVector](api/python/pyspark.mllib.linalg.SparseVector-class.html) class
 * [SciPy sparse matrices](http://docs.scipy.org/doc/scipy/reference/sparse.html)
 
 For efficiency, we recommend using NumPy arrays over lists, and using the
@@ -52,8 +52,8 @@ For efficiency, we recommend using NumPy arrays over lists, and using the
 for SciPy matrices, or MLlib's own SparseVector class.
 
 Several other simple data types are used throughout the library, e.g. the LabeledPoint
-class ([Java/Scala](api/mllib/index.html#org.apache.spark.mllib.regression.LabeledPoint),
-[Python](api/pyspark/pyspark.mllib.regression.LabeledPoint-class.html)) for labeled data.
+class ([Java/Scala](api/scala/index.html#org.apache.spark.mllib.regression.LabeledPoint),
+[Python](api/python/pyspark.mllib.regression.LabeledPoint-class.html)) for labeled data.
 
 # Dependencies
 MLlib uses the [jblas](https://github.com/mikiobraun/jblas) linear algebra library, which itself

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/mllib-optimization.md
----------------------------------------------------------------------
diff --git a/docs/mllib-optimization.md b/docs/mllib-optimization.md
index 396b98d..c79cc3d 100644
--- a/docs/mllib-optimization.md
+++ b/docs/mllib-optimization.md
@@ -95,12 +95,12 @@ As an alternative to just use the subgradient `$R'(\wv)$` of the regularizer in
 direction, an improved update for some cases can be obtained by using the proximal operator
 instead.
 For the L1-regularizer, the proximal operator is given by soft thresholding, as implemented in
-[L1Updater](api/mllib/index.html#org.apache.spark.mllib.optimization.L1Updater).
+[L1Updater](api/scala/index.html#org.apache.spark.mllib.optimization.L1Updater).
 
 
 ## Update Schemes for Distributed SGD
 The SGD implementation in
-[GradientDescent](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent) uses
+[GradientDescent](api/scala/index.html#org.apache.spark.mllib.optimization.GradientDescent) uses
 a simple (distributed) sampling of the data examples.
 We recall that the loss part of the optimization problem `$\eqref{eq:regPrimal}$` is
 `$\frac1n \sum_{i=1}^n L(\wv;\x_i,y_i)$`, and therefore `$\frac1n \sum_{i=1}^n L'_{\wv,i}$` would
@@ -138,7 +138,7 @@ are developed, see the
 section for example.
 
 The SGD method
-[GradientDescent.runMiniBatchSGD](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent)
+[GradientDescent.runMiniBatchSGD](api/scala/index.html#org.apache.spark.mllib.optimization.GradientDescent)
 has the following parameters:
 
 * `gradient` is a class that computes the stochastic gradient of the function
@@ -161,6 +161,6 @@ each iteration, to compute the gradient direction.
 
 Available algorithms for gradient descent:
 
-* [GradientDescent.runMiniBatchSGD](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent)
+* [GradientDescent.runMiniBatchSGD](api/scala/index.html#org.apache.spark.mllib.optimization.GradientDescent)
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/python-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/python-programming-guide.md b/docs/python-programming-guide.md
index 39de603..98233bf 100644
--- a/docs/python-programming-guide.md
+++ b/docs/python-programming-guide.md
@@ -134,7 +134,7 @@ Files listed here will be added to the `PYTHONPATH` and shipped to remote worker
 Code dependencies can be added to an existing SparkContext using its `addPyFile()` method.
 
 You can set [configuration properties](configuration.html#spark-properties) by passing a
-[SparkConf](api/pyspark/pyspark.conf.SparkConf-class.html) object to SparkContext:
+[SparkConf](api/python/pyspark.conf.SparkConf-class.html) object to SparkContext:
 
 {% highlight python %}
 from pyspark import SparkConf, SparkContext
@@ -147,7 +147,7 @@ sc = SparkContext(conf = conf)
 
 # API Docs
 
-[API documentation](api/pyspark/index.html) for PySpark is available as Epydoc.
+[API documentation](api/python/index.html) for PySpark is available as Epydoc.
 Many of the methods also contain [doctests](http://docs.python.org/2/library/doctest.html) that provide additional usage examples.
 
 # Libraries

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/quick-start.md
----------------------------------------------------------------------
diff --git a/docs/quick-start.md b/docs/quick-start.md
index 6b4f4ba..68afa6e 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -138,7 +138,9 @@ Spark README. Note that you'll need to replace YOUR_SPARK_HOME with the location
 installed. Unlike the earlier examples with the Spark shell, which initializes its own SparkContext,
 we initialize a SparkContext as part of the program.
 
-We pass the SparkContext constructor a SparkConf object which contains information about our
+We pass the SparkContext constructor a 
+[SparkConf](api/scala/index.html#org.apache.spark.SparkConf)
+object which contains information about our
 application. We also call sc.addJar to make sure that when our application is launched in cluster
 mode, the jar file containing it will be shipped automatically to worker nodes.
 
@@ -327,4 +329,4 @@ Congratulations on running your first Spark application!
 
 * For an in-depth overview of the API see "Programming Guides" menu section.
 * For running applications on a cluster head to the [deployment overview](cluster-overview.html).
-* For configuration options available to Spark applications see the [configuration page](configuration.html).
\ No newline at end of file
+* For configuration options available to Spark applications see the [configuration page](configuration.html).

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/scala-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/scala-programming-guide.md b/docs/scala-programming-guide.md
index 4431da0..a317170 100644
--- a/docs/scala-programming-guide.md
+++ b/docs/scala-programming-guide.md
@@ -147,7 +147,7 @@ All transformations in Spark are <i>lazy</i>, in that they do not compute their
 
 By default, each transformed RDD is recomputed each time you run an action on it. However, you may also *persist* an RDD in memory using the `persist` (or `cache`) method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it. There is also support for persisting datasets on disk, or replicated across the cluster. The next section in this document describes these options.
 
-The following tables list the transformations and actions currently supported (see also the [RDD API doc](api/core/index.html#org.apache.spark.rdd.RDD) for details):
+The following tables list the transformations and actions currently supported (see also the [RDD API doc](api/scala/index.html#org.apache.spark.rdd.RDD) for details):
 
 ### Transformations
 
@@ -216,7 +216,7 @@ The following tables list the transformations and actions currently supported (s
 </tr>
 </table>
 
-A complete list of transformations is available in the [RDD API doc](api/core/index.html#org.apache.spark.rdd.RDD).
+A complete list of transformations is available in the [RDD API doc](api/scala/index.html#org.apache.spark.rdd.RDD).
 
 ### Actions
 
@@ -264,7 +264,7 @@ A complete list of transformations is available in the [RDD API doc](api/core/in
 </tr>
 </table>
 
-A complete list of actions is available in the [RDD API doc](api/core/index.html#org.apache.spark.rdd.RDD).
+A complete list of actions is available in the [RDD API doc](api/scala/index.html#org.apache.spark.rdd.RDD).
 
 ## RDD Persistence
 
@@ -283,7 +283,7 @@ In addition, each RDD can be stored using a different *storage level*, allowing
 persist the dataset on disk, or persist it in memory but as serialized Java objects (to save space),
 or replicate it across nodes, or store the data in off-heap memory in [Tachyon](http://tachyon-project.org/).
 These levels are chosen by passing a
-[`org.apache.spark.storage.StorageLevel`](api/core/index.html#org.apache.spark.storage.StorageLevel)
+[`org.apache.spark.storage.StorageLevel`](api/scala/index.html#org.apache.spark.storage.StorageLevel)
 object to `persist()`. The `cache()` method is a shorthand for using the default storage level,
 which is `StorageLevel.MEMORY_ONLY` (store deserialized objects in memory). The complete set of
 available storage levels is:
@@ -355,7 +355,7 @@ waiting to recompute a lost partition.
 
 If you want to define your own storage level (say, with replication factor of 3 instead of 2), then
 use the function factor method `apply()` of the
-[`StorageLevel`](api/core/index.html#org.apache.spark.storage.StorageLevel$) singleton object.
+[`StorageLevel`](api/scala/index.html#org.apache.spark.storage.StorageLevel$) singleton object.
 
 Spark has a block manager inside the Executors that let you chose memory, disk, or off-heap. The
 latter is for storing RDDs off-heap outside the Executor JVM on top of the memory management system

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 8e98cc0..e25379b 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -14,8 +14,8 @@ title: Spark SQL Programming Guide
 
 Spark SQL allows relational queries expressed in SQL, HiveQL, or Scala to be executed using
 Spark.  At the core of this component is a new type of RDD,
-[SchemaRDD](api/sql/core/index.html#org.apache.spark.sql.SchemaRDD).  SchemaRDDs are composed
-[Row](api/sql/catalyst/index.html#org.apache.spark.sql.catalyst.expressions.Row) objects along with
+[SchemaRDD](api/scala/index.html#org.apache.spark.sql.SchemaRDD).  SchemaRDDs are composed
+[Row](api/scala/index.html#org.apache.spark.sql.catalyst.expressions.Row) objects along with
 a schema that describes the data types of each column in the row.  A SchemaRDD is similar to a table
 in a traditional relational database.  A SchemaRDD can be created from an existing RDD, parquet
 file, or by running HiveQL against data stored in [Apache Hive](http://hive.apache.org/).
@@ -27,8 +27,8 @@ file, or by running HiveQL against data stored in [Apache Hive](http://hive.apac
 <div data-lang="java"  markdown="1">
 Spark SQL allows relational queries expressed in SQL, HiveQL, or Scala to be executed using
 Spark.  At the core of this component is a new type of RDD,
-[JavaSchemaRDD](api/sql/core/index.html#org.apache.spark.sql.api.java.JavaSchemaRDD).  JavaSchemaRDDs are composed
-[Row](api/sql/catalyst/index.html#org.apache.spark.sql.api.java.Row) objects along with
+[JavaSchemaRDD](api/scala/index.html#org.apache.spark.sql.api.java.JavaSchemaRDD).  JavaSchemaRDDs are composed
+[Row](api/scala/index.html#org.apache.spark.sql.api.java.Row) objects along with
 a schema that describes the data types of each column in the row.  A JavaSchemaRDD is similar to a table
 in a traditional relational database.  A JavaSchemaRDD can be created from an existing RDD, parquet
 file, or by running HiveQL against data stored in [Apache Hive](http://hive.apache.org/).
@@ -38,8 +38,8 @@ file, or by running HiveQL against data stored in [Apache Hive](http://hive.apac
 
 Spark SQL allows relational queries expressed in SQL or HiveQL to be executed using
 Spark.  At the core of this component is a new type of RDD,
-[SchemaRDD](api/pyspark/pyspark.sql.SchemaRDD-class.html).  SchemaRDDs are composed
-[Row](api/pyspark/pyspark.sql.Row-class.html) objects along with
+[SchemaRDD](api/python/pyspark.sql.SchemaRDD-class.html).  SchemaRDDs are composed
+[Row](api/python/pyspark.sql.Row-class.html) objects along with
 a schema that describes the data types of each column in the row.  A SchemaRDD is similar to a table
 in a traditional relational database.  A SchemaRDD can be created from an existing RDD, parquet
 file, or by running HiveQL against data stored in [Apache Hive](http://hive.apache.org/).
@@ -56,7 +56,7 @@ file, or by running HiveQL against data stored in [Apache Hive](http://hive.apac
 <div data-lang="scala"  markdown="1">
 
 The entry point into all relational functionality in Spark is the
-[SQLContext](api/sql/core/index.html#org.apache.spark.sql.SQLContext) class, or one of its
+[SQLContext](api/scala/index.html#org.apache.spark.sql.SQLContext) class, or one of its
 descendants.  To create a basic SQLContext, all you need is a SparkContext.
 
 {% highlight scala %}
@@ -72,7 +72,7 @@ import sqlContext._
 <div data-lang="java" markdown="1">
 
 The entry point into all relational functionality in Spark is the
-[JavaSQLContext](api/sql/core/index.html#org.apache.spark.sql.api.java.JavaSQLContext) class, or one
+[JavaSQLContext](api/scala/index.html#org.apache.spark.sql.api.java.JavaSQLContext) class, or one
 of its descendants.  To create a basic JavaSQLContext, all you need is a JavaSparkContext.
 
 {% highlight java %}
@@ -85,7 +85,7 @@ JavaSQLContext sqlCtx = new org.apache.spark.sql.api.java.JavaSQLContext(ctx);
 <div data-lang="python"  markdown="1">
 
 The entry point into all relational functionality in Spark is the
-[SQLContext](api/pyspark/pyspark.sql.SQLContext-class.html) class, or one
+[SQLContext](api/python/pyspark.sql.SQLContext-class.html) class, or one
 of its decedents.  To create a basic SQLContext, all you need is a SparkContext.
 
 {% highlight python %}
@@ -331,7 +331,7 @@ val teenagers = people.where('age >= 10).where('age <= 19).select('name)
 The DSL uses Scala symbols to represent columns in the underlying table, which are identifiers
 prefixed with a tick (`'`).  Implicit conversions turn these symbols into expressions that are
 evaluated by the SQL execution engine.  A full list of the functions supported can be found in the
-[ScalaDoc](api/sql/core/index.html#org.apache.spark.sql.SchemaRDD).
+[ScalaDoc](api/scala/index.html#org.apache.spark.sql.SchemaRDD).
 
 <!-- TODO: Include the table of operations here. -->
 

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/streaming-custom-receivers.md
----------------------------------------------------------------------
diff --git a/docs/streaming-custom-receivers.md b/docs/streaming-custom-receivers.md
index 3fb540c..3cfa451 100644
--- a/docs/streaming-custom-receivers.md
+++ b/docs/streaming-custom-receivers.md
@@ -9,7 +9,7 @@ This guide shows the programming model and features by walking through a simple
 
 ### Writing a Simple Receiver
 
-This starts with implementing [NetworkReceiver](api/streaming/index.html#org.apache.spark.streaming.dstream.NetworkReceiver).
+This starts with implementing [NetworkReceiver](api/scala/index.html#org.apache.spark.streaming.dstream.NetworkReceiver).
 
 The following is a simple socket text-stream receiver.
 
@@ -125,4 +125,4 @@ _A more comprehensive example is provided in the spark streaming examples_
 ## References
 
 1.[Akka Actor documentation](http://doc.akka.io/docs/akka/2.0.5/scala/actors.html)
-2.[NetworkReceiver](api/streaming/index.html#org.apache.spark.streaming.dstream.NetworkReceiver)
+2.[NetworkReceiver](api/scala/index.html#org.apache.spark.streaming.dstream.NetworkReceiver)

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index f9904d4..946d6c4 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -40,7 +40,7 @@ Spark Streaming provides a high-level abstraction called *discretized stream* or
 which represents a continuous stream of data. DStreams can be created either from input data
 stream from sources such as Kafka and Flume, or by applying high-level
 operations on other DStreams. Internally, a DStream is represented as a sequence of
-[RDDs](api/core/index.html#org.apache.spark.rdd.RDD).
+[RDDs](api/scala/index.html#org.apache.spark.rdd.RDD).
 
 This guide shows you how to start writing Spark Streaming programs with DStreams. You can
 write Spark Streaming programs in Scala or Java, both of which are presented in this guide. You
@@ -62,7 +62,7 @@ First, we import the names of the Spark Streaming classes, and some implicit
 conversions from StreamingContext into our environment, to add useful methods to
 other classes we need (like DStream).
 
-[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) is the
+[StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) is the
 main entry point for all streaming functionality.
 
 {% highlight scala %}
@@ -71,7 +71,7 @@ import org.apache.spark.streaming.StreamingContext._
 {% endhighlight %}
 
 Then we create a
-[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object.
+[StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) object.
 Besides Spark's configuration, we specify that any DStream will be processed
 in 1 second batches.
 
@@ -132,7 +132,7 @@ The complete code can be found in the Spark Streaming example
 <div data-lang="java" markdown="1">
 
 First, we create a
-[JavaStreamingContext](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext) object,
+[JavaStreamingContext](api/scala/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext) object,
 which is the main entry point for all streaming
 functionality. Besides Spark's configuration, we specify that any DStream would be processed
 in 1 second batches.
@@ -168,7 +168,7 @@ JavaDStream<String> words = lines.flatMap(
 generating multiple new records from each record in the source DStream. In this case,
 each line will be split into multiple words and the stream of words is represented as the
 `words` DStream. Note that we defined the transformation using a
-[FlatMapFunction](api/core/index.html#org.apache.spark.api.java.function.FlatMapFunction) object.
+[FlatMapFunction](api/scala/index.html#org.apache.spark.api.java.function.FlatMapFunction) object.
 As we will discover along the way, there are a number of such convenience classes in the Java API
 that help define DStream transformations.
 
@@ -192,9 +192,9 @@ wordCounts.print();     // Print a few of the counts to the console
 {% endhighlight %}
 
 The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
-1)` pairs, using a [PairFunction](api/core/index.html#org.apache.spark.api.java.function.PairFunction)
+1)` pairs, using a [PairFunction](api/scala/index.html#org.apache.spark.api.java.function.PairFunction)
 object. Then, it is reduced to get the frequency of words in each batch of data,
-using a [Function2](api/core/index.html#org.apache.spark.api.java.function.Function2) object.
+using a [Function2](api/scala/index.html#org.apache.spark.api.java.function.Function2) object.
 Finally, `wordCounts.print()` will print a few of the counts generated every second.
 
 Note that when these lines are executed, Spark Streaming only sets up the computation it
@@ -333,7 +333,7 @@ for the full list of supported sources and artifacts.
 <div data-lang="scala" markdown="1">
 
 To initialize a Spark Streaming program in Scala, a
-[`StreamingContext`](api/streaming/index.html#org.apache.spark.streaming.StreamingContext)
+[`StreamingContext`](api/scala/index.html#org.apache.spark.streaming.StreamingContext)
 object has to be created, which is the main entry point of all Spark Streaming functionality.
 A `StreamingContext` object can be created by using
 
@@ -344,7 +344,7 @@ new StreamingContext(master, appName, batchDuration, [sparkHome], [jars])
 <div data-lang="java" markdown="1">
 
 To initialize a Spark Streaming program in Java, a
-[`JavaStreamingContext`](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext)
+[`JavaStreamingContext`](api/scala/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext)
 object has to be created, which is the main entry point of all Spark Streaming functionality.
 A `JavaStreamingContext` object can be created by using
 
@@ -431,8 +431,8 @@ and process any files created in that directory. Note that
 
 For more details on streams from files, Akka actors and sockets,
 see the API documentations of the relevant functions in
-[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) for
-Scala and [JavaStreamingContext](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext)
+[StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) for
+Scala and [JavaStreamingContext](api/scala/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext)
  for Java.
 
 Additional functionality for creating DStreams from sources such as Kafka, Flume, and Twitter
@@ -802,10 +802,10 @@ output operators are defined:
 
 
 The complete list of DStream operations is available in the API documentation. For the Scala API,
-see [DStream](api/streaming/index.html#org.apache.spark.streaming.dstream.DStream)
-and [PairDStreamFunctions](api/streaming/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions).
-For the Java API, see [JavaDStream](api/streaming/index.html#org.apache.spark.streaming.api.java.dstream.DStream)
-and [JavaPairDStream](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaPairDStream).
+see [DStream](api/scala/index.html#org.apache.spark.streaming.dstream.DStream)
+and [PairDStreamFunctions](api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions).
+For the Java API, see [JavaDStream](api/scala/index.html#org.apache.spark.streaming.api.java.dstream.DStream)
+and [JavaPairDStream](api/scala/index.html#org.apache.spark.streaming.api.java.JavaPairDStream).
 Specifically for the Java API, see [Spark's Java programming guide](java-programming-guide.html)
 for more information.
 
@@ -881,7 +881,7 @@ Cluster resources maybe under-utilized if the number of parallel tasks used in a
 computation is not high enough. For example, for distributed reduce operations like `reduceByKey`
 and `reduceByKeyAndWindow`, the default number of parallel tasks is 8. You can pass the level of
 parallelism as an argument (see the
-[`PairDStreamFunctions`](api/streaming/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions)
+[`PairDStreamFunctions`](api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions)
 documentation), or set the [config property](configuration.html#spark-properties)
 `spark.default.parallelism` to change the default.
 
@@ -925,7 +925,7 @@ A good approach to figure out the right batch size for your application is to te
 conservative batch size (say, 5-10 seconds) and a low data rate. To verify whether the system
 is able to keep up with data rate, you can check the value of the end-to-end delay experienced
 by each processed batch (either look for "Total delay" in Spark driver log4j logs, or use the
-[StreamingListener](api/streaming/index.html#org.apache.spark.streaming.scheduler.StreamingListener)
+[StreamingListener](api/scala/index.html#org.apache.spark.streaming.scheduler.StreamingListener)
 interface).
 If the delay is maintained to be comparable to the batch size, then system is stable. Otherwise,
 if the delay is continuously increasing, it means that the system is unable to keep up and it
@@ -952,7 +952,7 @@ exception saying so.
 ## Monitoring
 Besides Spark's in-built [monitoring capabilities](monitoring.html),
 the progress of a Spark Streaming program can also be monitored using the [StreamingListener]
-(api/streaming/index.html#org.apache.spark.scheduler.StreamingListener) interface,
+(api/scala/index.html#org.apache.spark.scheduler.StreamingListener) interface,
 which allows you to get statistics of batch processing times, queueing delays,
 and total end-to-end delays. Note that this is still an experimental API and it is likely to be
 improved upon (i.e., more information reported) in the future.
@@ -965,9 +965,9 @@ in Spark Streaming applications and achieving more consistent batch processing t
 
 * **Default persistence level of DStreams**: Unlike RDDs, the default persistence level of DStreams
 serializes the data in memory (that is,
-[StorageLevel.MEMORY_ONLY_SER](api/core/index.html#org.apache.spark.storage.StorageLevel$) for
+[StorageLevel.MEMORY_ONLY_SER](api/scala/index.html#org.apache.spark.storage.StorageLevel$) for
 DStream compared to
-[StorageLevel.MEMORY_ONLY](api/core/index.html#org.apache.spark.storage.StorageLevel$) for RDDs).
+[StorageLevel.MEMORY_ONLY](api/scala/index.html#org.apache.spark.storage.StorageLevel$) for RDDs).
 Even though keeping the data serialized incurs higher serialization/deserialization overheads,
 it significantly reduces GC pauses.
 
@@ -1244,15 +1244,15 @@ and output 30 after recovery.
 # Where to Go from Here
 
 * API documentation
-  - Main docs of StreamingContext and DStreams in [Scala](api/streaming/index.html#org.apache.spark.streaming.package)
-    and [Java](api/streaming/index.html#org.apache.spark.streaming.api.java.package)
+  - Main docs of StreamingContext and DStreams in [Scala](api/scala/index.html#org.apache.spark.streaming.package)
+    and [Java](api/scala/index.html#org.apache.spark.streaming.api.java.package)
   - Additional docs for
-    [Kafka](api/external/kafka/index.html#org.apache.spark.streaming.kafka.KafkaUtils$),
-    [Flume](api/external/flume/index.html#org.apache.spark.streaming.flume.FlumeUtils$),
-    [Twitter](api/external/twitter/index.html#org.apache.spark.streaming.twitter.TwitterUtils$),
-    [ZeroMQ](api/external/zeromq/index.html#org.apache.spark.streaming.zeromq.ZeroMQUtils$), and
-    [MQTT](api/external/mqtt/index.html#org.apache.spark.streaming.mqtt.MQTTUtils$)
+    [Kafka](api/scala/index.html#org.apache.spark.streaming.kafka.KafkaUtils$),
+    [Flume](api/scala/index.html#org.apache.spark.streaming.flume.FlumeUtils$),
+    [Twitter](api/scala/index.html#org.apache.spark.streaming.twitter.TwitterUtils$),
+    [ZeroMQ](api/scala/index.html#org.apache.spark.streaming.zeromq.ZeroMQUtils$), and
+    [MQTT](api/scala/index.html#org.apache.spark.streaming.mqtt.MQTTUtils$)
 
 * More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples)
   and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/streaming/examples)
-* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) describing Spark Streaming
+* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) describing Spark Streaming.

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/docs/tuning.md
----------------------------------------------------------------------
diff --git a/docs/tuning.md b/docs/tuning.md
index cc069f0..78e1077 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -48,7 +48,7 @@ Spark automatically includes Kryo serializers for the many commonly-used core Sc
 in the AllScalaRegistrar from the [Twitter chill](https://github.com/twitter/chill) library.
 
 To register your own custom classes with Kryo, create a public class that extends
-[`org.apache.spark.serializer.KryoRegistrator`](api/core/index.html#org.apache.spark.serializer.KryoRegistrator) and set the
+[`org.apache.spark.serializer.KryoRegistrator`](api/scala/index.html#org.apache.spark.serializer.KryoRegistrator) and set the
 `spark.kryo.registrator` config property to point to it, as follows:
 
 {% highlight scala %}
@@ -222,7 +222,7 @@ enough. Spark automatically sets the number of "map" tasks to run on each file a
 (though you can control it through optional parameters to `SparkContext.textFile`, etc), and for
 distributed "reduce" operations, such as `groupByKey` and `reduceByKey`, it uses the largest
 parent RDD's number of partitions. You can pass the level of parallelism as a second argument
-(see the [`spark.PairRDDFunctions`](api/core/index.html#org.apache.spark.rdd.PairRDDFunctions) documentation),
+(see the [`spark.PairRDDFunctions`](api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions) documentation),
 or set the config property `spark.default.parallelism` to change the default.
 In general, we recommend 2-3 tasks per CPU core in your cluster.
 

http://git-wip-us.apache.org/repos/asf/spark/blob/fc783847/project/SparkBuild.scala
----------------------------------------------------------------------
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 33f9d64..f115f0d 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -23,6 +23,8 @@ import AssemblyKeys._
 import scala.util.Properties
 import org.scalastyle.sbt.ScalastylePlugin.{Settings => ScalaStyleSettings}
 import com.typesafe.tools.mima.plugin.MimaKeys.previousArtifact
+import sbtunidoc.Plugin._
+import UnidocKeys._
 
 import scala.collection.JavaConversions._
 
@@ -31,6 +33,7 @@ import scala.collection.JavaConversions._
 
 object SparkBuild extends Build {
   val SPARK_VERSION = "1.0.0-SNAPSHOT"
+  val SPARK_VERSION_SHORT = SPARK_VERSION.replaceAll("-SNAPSHOT", "")
 
   // Hadoop version to build against. For example, "1.0.4" for Apache releases, or
   // "2.0.0-mr1-cdh4.2.0" for Cloudera Hadoop. Note that these variables can be set
@@ -184,12 +187,17 @@ object SparkBuild extends Build {
     // Show full stack trace and duration in test cases.
     testOptions in Test += Tests.Argument("-oDF"),
     // Remove certain packages from Scaladoc
-    scalacOptions in (Compile,doc) := Seq("-groups", "-skip-packages", Seq(
-      "akka",
-      "org.apache.spark.network",
-      "org.apache.spark.deploy",
-      "org.apache.spark.util.collection"
-      ).mkString(":")),
+    scalacOptions in (Compile, doc) := Seq(
+      "-groups",
+      "-skip-packages", Seq(
+        "akka",
+        "org.apache.spark.api.python",
+        "org.apache.spark.network",
+        "org.apache.spark.deploy",
+        "org.apache.spark.util.collection"
+      ).mkString(":"),
+      "-doc-title", "Spark " + SPARK_VERSION_SHORT + " ScalaDoc"
+    ),
 
     // Only allow one test at a time, even across projects, since they run in the same JVM
     concurrentRestrictions in Global += Tags.limit(Tags.Test, 1),
@@ -283,7 +291,7 @@ object SparkBuild extends Build {
     publishMavenStyle in MavenCompile := true,
     publishLocal in MavenCompile <<= publishTask(publishLocalConfiguration in MavenCompile, deliverLocal),
     publishLocalBoth <<= Seq(publishLocal in MavenCompile, publishLocal).dependOn
-  ) ++ net.virtualvoid.sbt.graph.Plugin.graphSettings ++ ScalaStyleSettings
+  ) ++ net.virtualvoid.sbt.graph.Plugin.graphSettings ++ ScalaStyleSettings ++ genjavadocSettings
 
   val akkaVersion = "2.2.3-shaded-protobuf"
   val chillVersion = "0.3.1"
@@ -349,15 +357,57 @@ object SparkBuild extends Build {
     libraryDependencies ++= maybeAvro
   )
 
-  def rootSettings = sharedSettings ++ Seq(
-    publish := {}
+  // Create a colon-separate package list adding "org.apache.spark" in front of all of them,
+  // for easier specification of JavaDoc package groups
+  def packageList(names: String*): String = {
+    names.map(s => "org.apache.spark." + s).mkString(":")
+  }
+
+  def rootSettings = sharedSettings ++ scalaJavaUnidocSettings ++ Seq(
+    publish := {},
+
+    unidocProjectFilter in (ScalaUnidoc, unidoc) :=
+      inAnyProject -- inProjects(repl, examples, tools, yarn, yarnAlpha),
+    unidocProjectFilter in (JavaUnidoc, unidoc) :=
+      inAnyProject -- inProjects(repl, examples, bagel, graphx, catalyst, tools, yarn, yarnAlpha),
+
+    // Skip class names containing $ and some internal packages in Javadocs
+    unidocAllSources in (JavaUnidoc, unidoc) := {
+      (unidocAllSources in (JavaUnidoc, unidoc)).value
+        .map(_.filterNot(_.getName.contains("$")))
+        .map(_.filterNot(_.getCanonicalPath.contains("akka")))
+        .map(_.filterNot(_.getCanonicalPath.contains("deploy")))
+        .map(_.filterNot(_.getCanonicalPath.contains("network")))
+        .map(_.filterNot(_.getCanonicalPath.contains("executor")))
+        .map(_.filterNot(_.getCanonicalPath.contains("python")))
+        .map(_.filterNot(_.getCanonicalPath.contains("collection")))
+    },
+
+    // Javadoc options: create a window title, and group key packages on index page
+    javacOptions in doc := Seq(
+      "-windowtitle", "Spark " + SPARK_VERSION_SHORT + " JavaDoc",
+      "-public",
+      "-group", "Core Java API", packageList("api.java", "api.java.function"),
+      "-group", "Spark Streaming", packageList(
+        "streaming.api.java", "streaming.flume", "streaming.kafka",
+        "streaming.mqtt", "streaming.twitter", "streaming.zeromq"
+      ),
+      "-group", "MLlib", packageList(
+        "mllib.classification", "mllib.clustering", "mllib.evaluation.binary", "mllib.linalg",
+        "mllib.linalg.distributed", "mllib.optimization", "mllib.rdd", "mllib.recommendation",
+        "mllib.regression", "mllib.stat", "mllib.tree", "mllib.tree.configuration",
+        "mllib.tree.impurity", "mllib.tree.model", "mllib.util"
+      ),
+      "-group", "Spark SQL", packageList("sql.api.java", "sql.hive.api.java"),
+      "-noqualifier", "java.lang"
+    )
   )
 
   def replSettings = sharedSettings ++ Seq(
     name := "spark-repl",
-   libraryDependencies <+= scalaVersion(v => "org.scala-lang"  % "scala-compiler" % v ),
-   libraryDependencies <+= scalaVersion(v => "org.scala-lang"  % "jline"          % v ),
-   libraryDependencies <+= scalaVersion(v => "org.scala-lang"  % "scala-reflect"  % v )
+    libraryDependencies <+= scalaVersion(v => "org.scala-lang"  % "scala-compiler" % v),
+    libraryDependencies <+= scalaVersion(v => "org.scala-lang"  % "jline"          % v),
+    libraryDependencies <+= scalaVersion(v => "org.scala-lang"  % "scala-reflect"  % v)
   )
 
   def examplesSettings = sharedSettings ++ Seq(