You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ma...@apache.org on 2013/09/01 23:59:39 UTC
[55/69] [abbrv] git commit: Update docs for new package

Update docs for new package


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/4f422032
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/4f422032
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/4f422032

Branch: refs/heads/branch-0.8
Commit: 4f422032e507d698b9c717b5228154d4527a639a
Parents: 4d1cb59
Author: Matei Zaharia <ma...@eecs.berkeley.edu>
Authored: Sat Aug 31 22:17:40 2013 -0700
Committer: Matei Zaharia <ma...@eecs.berkeley.edu>
Committed: Sun Sep 1 14:13:15 2013 -0700

----------------------------------------------------------------------
 docs/bagel-programming-guide.md     |  6 +++---
 docs/configuration.md               | 18 +++++++++---------
 docs/index.md                       |  8 ++++----
 docs/java-programming-guide.md      | 28 ++++++++++++++--------------
 docs/quick-start.md                 | 10 +++++-----
 docs/scala-programming-guide.md     | 16 ++++++++--------
 docs/streaming-programming-guide.md | 28 ++++++++++++++--------------
 7 files changed, 57 insertions(+), 57 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4f422032/docs/bagel-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/bagel-programming-guide.md b/docs/bagel-programming-guide.md
index c526da3..20b1e9b 100644
--- a/docs/bagel-programming-guide.md
+++ b/docs/bagel-programming-guide.md
@@ -29,8 +29,8 @@ representing the current PageRank of the vertex, and similarly extend
 the `Message` and `Edge` classes. Note that these need to be marked `@serializable` to allow Spark to transfer them across machines. We also import the Bagel types and implicit conversions.
 
 {% highlight scala %}
-import spark.bagel._
-import spark.bagel.Bagel._
+import org.apache.spark.bagel._
+import org.apache.spark.bagel.Bagel._
 
 @serializable class PREdge(val targetId: String) extends Edge
 
@@ -158,4 +158,4 @@ trait Message[K] {
 
 ## Where to Go from Here
 
-Two example jobs, PageRank and shortest path, are included in `examples/src/main/scala/spark/examples/bagel`. You can run them by passing the class name to the `run-example` script included in Spark -- for example, `./run-example spark.examples.bagel.WikipediaPageRank`. Each example program prints usage help when run without any arguments.
+Two example jobs, PageRank and shortest path, are included in `examples/src/main/scala/org/apache/spark/examples/bagel`. You can run them by passing the class name to the `run-example` script included in Spark -- for example, `./run-example org.apache.spark.examples.bagel.WikipediaPageRank`. Each example program prints usage help when run without any arguments.

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4f422032/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index 1c0492e..55df18b 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -36,13 +36,13 @@ there are at least five properties that you will commonly want to control:
 </tr>
 <tr>
   <td>spark.serializer</td>
-  <td>spark.JavaSerializer</td>
+  <td>org.apache.spark.JavaSerializer</td>
   <td>
     Class to use for serializing objects that will be sent over the network or need to be cached
     in serialized form. The default of Java serialization works with any Serializable Java object but is
-    quite slow, so we recommend <a href="tuning.html">using <code>spark.KryoSerializer</code>
+    quite slow, so we recommend <a href="tuning.html">using <code>org.apache.spark.KryoSerializer</code>
     and configuring Kryo serialization</a> when speed is necessary. Can be any subclass of
-    <a href="api/core/index.html#spark.Serializer"><code>spark.Serializer</code></a>.
+    <a href="api/core/index.html#org.apache.spark.Serializer"><code>org.apache.spark.Serializer</code></a>.
   </td>
 </tr>
 <tr>
@@ -50,8 +50,8 @@ there are at least five properties that you will commonly want to control:
   <td>(none)</td>
   <td>
     If you use Kryo serialization, set this class to register your custom classes with Kryo.
-    You need to set it to a class that extends
-    <a href="api/core/index.html#spark.KryoRegistrator"><code>spark.KryoRegistrator</code></a>.
+    It should be set to a class that extends
+    <a href="api/core/index.html#org.apache.spark.KryoRegistrator"><code>KryoRegistrator</code></a>.
     See the <a href="tuning.html#data-serialization">tuning guide</a> for more details.
   </td>
 </tr>
@@ -147,10 +147,10 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td>spark.io.compression.codec</td>
-  <td>spark.io.SnappyCompressionCodec</td>
+  <td>org.apache.spark.io.<br />SnappyCompressionCodec</td>
   <td>
     The compression codec class to use for various compressions. By default, Spark provides two
-    codecs: <code>spark.io.LZFCompressionCodec</code> and <code>spark.io.SnappyCompressionCodec</code>.
+    codecs: <code>org.apache.spark.io.LZFCompressionCodec</code> and <code>org.apache.spark.io.SnappyCompressionCodec</code>.
   </td>
 </tr>
 <tr>
@@ -171,7 +171,7 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td>spark.closure.serializer</td>
-  <td>spark.JavaSerializer</td>
+  <td>org.apache.spark.JavaSerializer</td>
   <td>
     Serializer class to use for closures. Generally Java is fine unless your distributed functions
     (e.g. map functions) reference large objects in the driver program.
@@ -198,7 +198,7 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td>spark.broadcast.factory</td>
-  <td>spark.broadcast.HttpBroadcastFactory</td>
+  <td>org.apache.spark.broadcast.<br />HttpBroadcastFactory</td>
   <td>
     Which broadcast implementation to use.
   </td>

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4f422032/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
index 0ea0e10..35a597d 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -3,8 +3,8 @@ layout: global
 title: Spark Overview
 ---
 
-Apache Spark is a cluster computing system that aims to make data analytics faster to run and faster to write.
-It provides high-level APIs in [Scala](scala-programming-guide.html), [Java](java-programming-guide.html), and [Python](python-programming-guide.html), and a general execution engine that supports rich operator graphs.
+Apache Spark is a fast and general-purpose cluster computing system.
+It provides high-level APIs in [Scala](scala-programming-guide.html), [Java](java-programming-guide.html), and [Python](python-programming-guide.html) that make parallel jobs easy to write, and an optimized engine that supports general computation graphs.
 Spark can run on the Apache Mesos cluster manager, Hadoop YARN, Amazon EC2, or without an independent resource manager ("standalone mode").
 
 # Downloading
@@ -24,8 +24,8 @@ For its Scala API, Spark {{site.SPARK_VERSION}} depends on Scala {{site.SCALA_VE
 Spark comes with several sample programs in the `examples` directory.
 To run one of the samples, use `./run-example <class> <params>` in the top-level Spark directory
 (the `run-example` script sets up the appropriate paths and launches that program).
-For example, `./run-example spark.examples.SparkPi` will run a sample program that estimates Pi. Each
-example prints usage help if no params are given.
+For example, try `./run-example org.apache.spark.examples.SparkPi local`.
+Each example prints usage help when run with no parameters.
 
 Note that all of the sample programs take a `<master>` parameter specifying the cluster URL
 to connect to. This can be a [URL for a distributed cluster](scala-programming-guide.html#master-urls),

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4f422032/docs/java-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/java-programming-guide.md b/docs/java-programming-guide.md
index dd19a5f..48bf366 100644
--- a/docs/java-programming-guide.md
+++ b/docs/java-programming-guide.md
@@ -10,9 +10,9 @@ easy to follow even if you don't know Scala.
 This guide will show how to use the Spark features described there in Java.
 
 The Spark Java API is defined in the
-[`spark.api.java`](api/core/index.html#spark.api.java.package) package, and includes
-a [`JavaSparkContext`](api/core/index.html#spark.api.java.JavaSparkContext) for
-initializing Spark and [`JavaRDD`](api/core/index.html#spark.api.java.JavaRDD) classes,
+[`org.apache.spark.api.java`](api/core/index.html#org.apache.spark.api.java.package) package, and includes
+a [`JavaSparkContext`](api/core/index.html#org.apache.spark.api.java.JavaSparkContext) for
+initializing Spark and [`JavaRDD`](api/core/index.html#org.apache.spark.api.java.JavaRDD) classes,
 which support the same methods as their Scala counterparts but take Java functions and return
 Java data and collection types. The main differences have to do with passing functions to RDD
 operations (e.g. map) and handling RDDs of different types, as discussed next.
@@ -23,12 +23,12 @@ There are a few key differences between the Java and Scala APIs:
 
 * Java does not support anonymous or first-class functions, so functions must
   be implemented by extending the
-  [`spark.api.java.function.Function`](api/core/index.html#spark.api.java.function.Function),
-  [`Function2`](api/core/index.html#spark.api.java.function.Function2), etc.
+  [`org.apache.spark.api.java.function.Function`](api/core/index.html#org.apache.spark.api.java.function.Function),
+  [`Function2`](api/core/index.html#org.apache.spark.api.java.function.Function2), etc.
   classes.
 * To maintain type safety, the Java API defines specialized Function and RDD
   classes for key-value pairs and doubles. For example, 
-  [`JavaPairRDD`](api/core/index.html#spark.api.java.JavaPairRDD)
+  [`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD)
   stores key-value pairs.
 * RDD methods like `collect()` and `countByKey()` return Java collections types,
   such as `java.util.List` and `java.util.Map`.
@@ -44,8 +44,8 @@ In the Scala API, these methods are automatically added using Scala's
 [implicit conversions](http://www.scala-lang.org/node/130) mechanism.
 
 In the Java API, the extra methods are defined in the
-[`JavaPairRDD`](api/core/index.html#spark.api.java.JavaPairRDD)
-and [`JavaDoubleRDD`](api/core/index.html#spark.api.java.JavaDoubleRDD)
+[`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD)
+and [`JavaDoubleRDD`](api/core/index.html#org.apache.spark.api.java.JavaDoubleRDD)
 classes.  RDD methods like `map` are overloaded by specialized `PairFunction`
 and `DoubleFunction` classes, allowing them to return RDDs of the appropriate
 types.  Common methods like `filter` and `sample` are implemented by
@@ -75,7 +75,7 @@ class has a single abstract method, `call()`, that must be implemented.
 ## Storage Levels
 
 RDD [storage level](scala-programming-guide.html#rdd-persistence) constants, such as `MEMORY_AND_DISK`, are
-declared in the [spark.api.java.StorageLevels](api/core/index.html#spark.api.java.StorageLevels) class. To
+declared in the [org.apache.spark.api.java.StorageLevels](api/core/index.html#org.apache.spark.api.java.StorageLevels) class. To
 define your own storage level, you can use StorageLevels.create(...). 
 
 
@@ -92,8 +92,8 @@ The Java API supports other Spark features, including
 As an example, we will implement word count using the Java API.
 
 {% highlight java %}
-import spark.api.java.*;
-import spark.api.java.function.*;
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.*;
 
 JavaSparkContext sc = new JavaSparkContext(...);
 JavaRDD<String> lines = ctx.textFile("hdfs://...");
@@ -179,7 +179,7 @@ just a matter of style.
 # Javadoc
 
 We currently provide documentation for the Java API as Scaladoc, in the
-[`spark.api.java` package](api/core/index.html#spark.api.java.package), because
+[`org.apache.spark.api.java` package](api/core/index.html#org.apache.spark.api.java.package), because
 some of the classes are implemented in Scala. The main downside is that the types and function
 definitions show Scala syntax (for example, `def reduce(func: Function2[T, T]): T` instead of
 `T reduce(Function2<T, T> func)`). 
@@ -189,7 +189,7 @@ We hope to generate documentation with Java-style syntax in the future.
 # Where to Go from Here
 
 Spark includes several sample programs using the Java API in
-[`examples/src/main/java`](https://github.com/mesos/spark/tree/master/examples/src/main/java/spark/examples).  You can run them by passing the class name to the
+[`examples/src/main/java`](https://github.com/mesos/spark/tree/master/examples/src/main/java/org/apache/spark/examples).  You can run them by passing the class name to the
 `run-example` script included in Spark -- for example, `./run-example
-spark.examples.JavaWordCount`.  Each example program prints usage help when run
+org.apache.spark.examples.JavaWordCount`.  Each example program prints usage help when run
 without any arguments.

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4f422032/docs/quick-start.md
----------------------------------------------------------------------
diff --git a/docs/quick-start.md b/docs/quick-start.md
index 4507b21..8cbc55b 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -108,7 +108,7 @@ We'll create a very simple Spark job in Scala. So simple, in fact, that it's nam
 
 {% highlight scala %}
 /*** SimpleJob.scala ***/
-import spark.SparkContext
+import org.apache.spark.SparkContext
 import SparkContext._
 
 object SimpleJob {
@@ -135,7 +135,7 @@ version := "1.0"
 
 scalaVersion := "{{site.SCALA_VERSION}}"
 
-libraryDependencies += "org.spark-project" %% "spark-core" % "{{site.SPARK_VERSION}}"
+libraryDependencies += "org.apache.spark" %% "spark-core" % "{{site.SPARK_VERSION}}"
 
 resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
 {% endhighlight %}
@@ -170,8 +170,8 @@ We'll create a very simple Spark job, `SimpleJob.java`:
 
 {% highlight java %}
 /*** SimpleJob.java ***/
-import spark.api.java.*;
-import spark.api.java.function.Function;
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
 
 public class SimpleJob {
   public static void main(String[] args) {
@@ -213,7 +213,7 @@ To build the job, we also write a Maven `pom.xml` file that lists Spark as a dep
   </repositories>
   <dependencies>
     <dependency> <!-- Spark dependency -->
-      <groupId>org.spark-project</groupId>
+      <groupId>org.apache.spark</groupId>
       <artifactId>spark-core_{{site.SCALA_VERSION}}</artifactId>
       <version>{{site.SPARK_VERSION}}</version>
     </dependency>

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4f422032/docs/scala-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/scala-programming-guide.md b/docs/scala-programming-guide.md
index e321b8f..5aa2b64 100644
--- a/docs/scala-programming-guide.md
+++ b/docs/scala-programming-guide.md
@@ -21,7 +21,7 @@ Spark {{site.SPARK_VERSION}} uses Scala {{site.SCALA_VERSION}}. If you write app
 
 To write a Spark application, you need to add a dependency on Spark. If you use SBT or Maven, Spark is available through Maven Central at:
 
-    groupId = org.spark-project
+    groupId = org.apache.spark
     artifactId = spark-core_{{site.SCALA_VERSION}}
     version = {{site.SPARK_VERSION}} 
 
@@ -36,7 +36,7 @@ For other build systems, you can run `sbt/sbt assembly` to pack Spark and its de
 Finally, you need to import some Spark classes and implicit conversions into your program. Add the following lines:
 
 {% highlight scala %}
-import spark.SparkContext
+import org.apache.spark.SparkContext
 import SparkContext._
 {% endhighlight %}
 
@@ -142,7 +142,7 @@ All transformations in Spark are <i>lazy</i>, in that they do not compute their
 
 By default, each transformed RDD is recomputed each time you run an action on it. However, you may also *persist* an RDD in memory using the `persist` (or `cache`) method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it. There is also support for persisting datasets on disk, or replicated across the cluster. The next section in this document describes these options.
 
-The following tables list the transformations and actions currently supported (see also the [RDD API doc](api/core/index.html#spark.RDD) for details):
+The following tables list the transformations and actions currently supported (see also the [RDD API doc](api/core/index.html#org.apache.spark.RDD) for details):
 
 ### Transformations
 
@@ -211,7 +211,7 @@ The following tables list the transformations and actions currently supported (s
 </tr>
 </table>
 
-A complete list of transformations is available in the [RDD API doc](api/core/index.html#spark.RDD).
+A complete list of transformations is available in the [RDD API doc](api/core/index.html#org.apache.spark.RDD).
 
 ### Actions
 
@@ -259,7 +259,7 @@ A complete list of transformations is available in the [RDD API doc](api/core/in
 </tr>
 </table>
 
-A complete list of actions is available in the [RDD API doc](api/core/index.html#spark.RDD).
+A complete list of actions is available in the [RDD API doc](api/core/index.html#org.apache.spark.RDD).
 
 ## RDD Persistence
 
@@ -267,7 +267,7 @@ One of the most important capabilities in Spark is *persisting* (or *caching*) a
 
 You can mark an RDD to be persisted using the `persist()` or `cache()` methods on it. The first time it is computed in an action, it will be kept in memory on the nodes. The cache is fault-tolerant -- if any partition of an RDD is lost, it will automatically be recomputed using the transformations that originally created it.
 
-In addition, each RDD can be stored using a different *storage level*, allowing you, for example, to persist the dataset on disk, or persist it in memory but as serialized Java objects (to save space), or even replicate it across nodes. These levels are chosen by passing a [`spark.storage.StorageLevel`](api/core/index.html#spark.storage.StorageLevel) object to `persist()`. The `cache()` method is a shorthand for using the default storage level, which is `StorageLevel.MEMORY_ONLY` (store deserialized objects in memory). The complete set of available storage levels is:
+In addition, each RDD can be stored using a different *storage level*, allowing you, for example, to persist the dataset on disk, or persist it in memory but as serialized Java objects (to save space), or even replicate it across nodes. These levels are chosen by passing a [`org.apache.spark.storage.StorageLevel`](api/core/index.html#org.apache.spark.storage.StorageLevel) object to `persist()`. The `cache()` method is a shorthand for using the default storage level, which is `StorageLevel.MEMORY_ONLY` (store deserialized objects in memory). The complete set of available storage levels is:
 
 <table class="table">
 <tr><th style="width:23%">Storage Level</th><th>Meaning</th></tr>
@@ -318,7 +318,7 @@ We recommend going through the following process to select one:
   application). *All* the storage levels provide full fault tolerance by recomputing lost data, but the replicated ones
   let you continue running tasks on the RDD without waiting to recompute a lost partition.
  
-If you want to define your own storage level (say, with replication factor of 3 instead of 2), then use the function factor method `apply()` of the [`StorageLevel`](api/core/index.html#spark.storage.StorageLevel$) singleton object.  
+If you want to define your own storage level (say, with replication factor of 3 instead of 2), then use the function factor method `apply()` of the [`StorageLevel`](api/core/index.html#org.apache.spark.storage.StorageLevel$) singleton object.  
 
 # Shared Variables
 
@@ -364,7 +364,7 @@ res2: Int = 10
 # Where to Go from Here
 
 You can see some [example Spark programs](http://www.spark-project.org/examples.html) on the Spark website.
-In addition, Spark includes several sample programs in `examples/src/main/scala`. Some of them have both Spark versions and local (non-parallel) versions, allowing you to see what had to be changed to make the program run on a cluster. You can run them using by passing the class name to the `run-example` script included in Spark -- for example, `./run-example spark.examples.SparkPi`. Each example program prints usage help when run without any arguments.
+In addition, Spark includes several sample programs in `examples/src/main/scala`. Some of them have both Spark versions and local (non-parallel) versions, allowing you to see what had to be changed to make the program run on a cluster. You can run them using by passing the class name to the `run-example` script included in Spark -- for example, `./run-example org.apache.spark.examples.SparkPi`. Each example program prints usage help when run without any arguments.
 
 For help on optimizing your program, the [configuration](configuration.html) and
 [tuning](tuning.html) guides provide information on best practices. They are especially important for

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/4f422032/docs/streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index 3330e63..e59d93d 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -34,7 +34,7 @@ ssc.textFileStream(directory)      // Creates a stream by monitoring and process
 ssc.socketStream(hostname, port)   // Creates a stream that uses a TCP socket to read data from hostname:port
 {% endhighlight %}
 
-We also provide a input streams for Kafka, Flume, Akka actor, etc. For a complete list of input streams, take a look at the [StreamingContext API documentation](api/streaming/index.html#spark.streaming.StreamingContext).
+We also provide a input streams for Kafka, Flume, Akka actor, etc. For a complete list of input streams, take a look at the [StreamingContext API documentation](api/streaming/index.html#org.apache.spark.streaming.StreamingContext).
 
 
 
@@ -156,7 +156,7 @@ Spark Streaming features windowed computations, which allow you to apply transfo
 
 </table>
 
-A complete list of DStream operations is available in the API documentation of [DStream](api/streaming/index.html#spark.streaming.DStream) and [PairDStreamFunctions](api/streaming/index.html#spark.streaming.PairDStreamFunctions).
+A complete list of DStream operations is available in the API documentation of [DStream](api/streaming/index.html#org.apache.spark.streaming.DStream) and [PairDStreamFunctions](api/streaming/index.html#org.apache.spark.streaming.PairDStreamFunctions).
 
 ## Output Operations
 When an output operator is called, it triggers the computation of a stream. Currently the following output operators are defined:
@@ -206,8 +206,8 @@ ssc.stop()
 A simple example to start off is the [NetworkWordCount](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/streaming/examples/NetworkWordCount.scala). This example counts the words received from a network server every second. Given below is the relevant sections of the source code. You can find the full source code in `<Spark repo>/streaming/src/main/scala/spark/streaming/examples/NetworkWordCount.scala` .
 
 {% highlight scala %}
-import spark.streaming.{Seconds, StreamingContext}
-import spark.streaming.StreamingContext._
+import org.apache.spark.streaming.{Seconds, StreamingContext}
+import StreamingContext._
 ...
 
 // Create the context and set up a network input stream to receive from a host:port
@@ -234,7 +234,7 @@ $ nc -lk 9999
 Then, in a different terminal, you can start NetworkWordCount by using
 
 {% highlight bash %}
-$ ./run-example spark.streaming.examples.NetworkWordCount local[2] localhost 9999
+$ ./run-example org.apache.spark.streaming.examples.NetworkWordCount local[2] localhost 9999
 {% endhighlight %}
 
 This will make NetworkWordCount connect to the netcat server. Any lines typed in the terminal running the netcat server will be counted and printed on screen.
@@ -272,7 +272,7 @@ Time: 1357008430000 ms
 </td>
 </table>
 
-You can find more examples in `<Spark repo>/streaming/src/main/scala/spark/streaming/examples/`. They can be run in the similar manner using `./run-example spark.streaming.examples....` . Executing without any parameter would give the required parameter list. Further explanation to run them can be found in comments in the files.
+You can find more examples in `<Spark repo>/streaming/src/main/scala/spark/streaming/examples/`. They can be run in the similar manner using `./run-example org.apache.spark.streaming.examples....` . Executing without any parameter would give the required parameter list. Further explanation to run them can be found in comments in the files.
 
 # DStream Persistence
 Similar to RDDs, DStreams also allow developers to persist the stream's data in memory. That is, using `persist()` method on a DStream would automatically persist every RDD of that DStream in memory. This is useful if the data in the DStream will be computed multiple times (e.g., multiple operations on the same data). For window-based operations like `reduceByWindow` and `reduceByKeyAndWindow` and state-based operations like `updateStateByKey`, this is implicitly true. Hence, DStreams generated by window-based operations are automatically persisted in memory, without the developer calling `persist()`.
@@ -315,7 +315,7 @@ Getting the best performance of a Spark Streaming application on a cluster requi
 There are a number of optimizations that can be done in Spark to minimize the processing time of each batch. These have been discussed in detail in [Tuning Guide](tuning.html). This section highlights some of the most important ones.
 
 ### Level of Parallelism
-Cluster resources maybe under-utilized if the number of parallel tasks used in any stage of the computation is not high enough. For example, for distributed reduce operations like `reduceByKey` and `reduceByKeyAndWindow`, the default number of parallel tasks is 8. You can pass the level of parallelism as an argument (see the [`spark.PairDStreamFunctions`](api/streaming/index.html#spark.PairDStreamFunctions) documentation), or set the system property `spark.default.parallelism` to change the default.
+Cluster resources maybe under-utilized if the number of parallel tasks used in any stage of the computation is not high enough. For example, for distributed reduce operations like `reduceByKey` and `reduceByKeyAndWindow`, the default number of parallel tasks is 8. You can pass the level of parallelism as an argument (see the [`PairDStreamFunctions`](api/streaming/index.html#org.apache.spark.PairDStreamFunctions) documentation), or set the system property `spark.default.parallelism` to change the default.
 
 ### Data Serialization
 The overhead of data serialization can be significant, especially when sub-second batch sizes are to be achieved. There are two aspects to it.
@@ -345,7 +345,7 @@ This value is closely tied with any window operation that is being used. Any win
 ## Memory Tuning
 Tuning the memory usage and GC behavior of Spark applications have been discussed in great detail in the [Tuning Guide](tuning.html). It is recommended that you read that. In this section, we highlight a few customizations that are strongly recommended to minimize GC related pauses in Spark Streaming applications and achieving more consistent batch processing times.
 
-* **Default persistence level of DStreams**: Unlike RDDs, the default persistence level of DStreams serializes the data in memory (that is, [StorageLevel.MEMORY_ONLY_SER](api/core/index.html#spark.storage.StorageLevel$) for DStream compared to [StorageLevel.MEMORY_ONLY](api/core/index.html#spark.storage.StorageLevel$) for RDDs). Even though keeping the data serialized incurs a higher serialization overheads, it significantly reduces GC pauses.
+* **Default persistence level of DStreams**: Unlike RDDs, the default persistence level of DStreams serializes the data in memory (that is, [StorageLevel.MEMORY_ONLY_SER](api/core/index.html#org.apache.spark.storage.StorageLevel$) for DStream compared to [StorageLevel.MEMORY_ONLY](api/core/index.html#org.apache.spark.storage.StorageLevel$) for RDDs). Even though keeping the data serialized incurs a higher serialization overheads, it significantly reduces GC pauses.
 
 * **Concurrent garbage collector**: Using the concurrent mark-and-sweep GC further minimizes the variability of GC pauses. Even though concurrent GC is known to reduce the overall processing throughput of the system, its use is still recommended to achieve more consistent batch processing times.
 
@@ -467,10 +467,10 @@ If the driver had crashed in the middle of the processing of time 3, then it wil
 
 # Java API
 
-Similar to [Spark's Java API](java-programming-guide.html), we also provide a Java API for Spark Streaming which allows all its features to be accessible from a Java program. This is defined in [spark.streaming.api.java] (api/streaming/index.html#spark.streaming.api.java.package) package and includes [JavaStreamingContext](api/streaming/index.html#spark.streaming.api.java.JavaStreamingContext) and [JavaDStream](api/streaming/index.html#spark.streaming.api.java.JavaDStream) classes that provide the same methods as their Scala counterparts, but take Java functions (that is, Function, and Function2) and return Java data and collection types. Some of the key points to note are:
+Similar to [Spark's Java API](java-programming-guide.html), we also provide a Java API for Spark Streaming which allows all its features to be accessible from a Java program. This is defined in [org.apache.spark.streaming.api.java] (api/streaming/index.html#org.apache.spark.streaming.api.java.package) package and includes [JavaStreamingContext](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext) and [JavaDStream](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaDStream) classes that provide the same methods as their Scala counterparts, but take Java functions (that is, Function, and Function2) and return Java data and collection types. Some of the key points to note are:
 
-1. Functions for transformations must be implemented as subclasses of [Function](api/core/index.html#spark.api.java.function.Function) and [Function2](api/core/index.html#spark.api.java.function.Function2)
-1. Unlike the Scala API, the Java API handles DStreams for key-value pairs using a separate [JavaPairDStream](api/streaming/index.html#spark.streaming.api.java.JavaPairDStream) class(similar to [JavaRDD and JavaPairRDD](java-programming-guide.html#rdd-classes). DStream functions like `map` and `filter` are implemented separately by JavaDStreams and JavaPairDStream to return DStreams of appropriate types.
+1. Functions for transformations must be implemented as subclasses of [Function](api/core/index.html#org.apache.spark.api.java.function.Function) and [Function2](api/core/index.html#org.apache.spark.api.java.function.Function2)
+1. Unlike the Scala API, the Java API handles DStreams for key-value pairs using a separate [JavaPairDStream](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaPairDStream) class(similar to [JavaRDD and JavaPairRDD](java-programming-guide.html#rdd-classes). DStream functions like `map` and `filter` are implemented separately by JavaDStreams and JavaPairDStream to return DStreams of appropriate types.
 
 Spark's [Java Programming Guide](java-programming-guide.html) gives more ideas about using the Java API. To extends the ideas presented for the RDDs to DStreams, we present parts of the Java version of the same NetworkWordCount example presented above. The full source code is given at `<spark repo>/examples/src/main/java/spark/streaming/examples/JavaNetworkWordCount.java`
 
@@ -482,7 +482,7 @@ JavaDStream<String> lines = ssc.socketTextStream(ip, port);
 {% endhighlight %}
 
 
-Then the `lines` are split into words by using the `flatMap` function and [FlatMapFunction](api/core/index.html#spark.api.java.function.FlatMapFunction).
+Then the `lines` are split into words by using the `flatMap` function and [FlatMapFunction](api/core/index.html#org.apache.spark.api.java.function.FlatMapFunction).
 
 {% highlight java %}
 JavaDStream<String> words = lines.flatMap(
@@ -494,7 +494,7 @@ JavaDStream<String> words = lines.flatMap(
   });
 {% endhighlight %}
 
-The `words` is then mapped to a [JavaPairDStream](api/streaming/index.html#spark.streaming.api.java.JavaPairDStream) of `(word, 1)` pairs using `map` and [PairFunction](api/core/index.html#spark.api.java.function.PairFunction). This is  reduced by using `reduceByKey` and [Function2](api/core/index.html#spark.api.java.function.Function2).
+The `words` is then mapped to a [JavaPairDStream](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaPairDStream) of `(word, 1)` pairs using `map` and [PairFunction](api/core/index.html#org.apache.spark.api.java.function.PairFunction). This is  reduced by using `reduceByKey` and [Function2](api/core/index.html#org.apache.spark.api.java.function.Function2).
 
 {% highlight java %}
 JavaPairDStream<String, Integer> wordCounts = words.map(
@@ -515,6 +515,6 @@ JavaPairDStream<String, Integer> wordCounts = words.map(
 
 
 # Where to Go from Here
-* API docs - [Scala](api/streaming/index.html#spark.streaming.package) and [Java](api/streaming/index.html#spark.streaming.api.java.package)
+* API docs - [Scala](api/streaming/index.html#org.apache.spark.streaming.package) and [Java](api/streaming/index.html#org.apache.spark.streaming.api.java.package)
 * More examples - [Scala](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/streaming/examples) and [Java](https://github.com/mesos/spark/tree/master/examples/src/main/java/spark/streaming/examples)
 * [Paper describing Spark Streaming](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf)