You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by ja...@apache.org on 2017/06/01 01:49:59 UTC
[5/5] samza git commit: SAMZA-1234: Documentation for 0.13.0 release

SAMZA-1234: Documentation for 0.13.0 release

Author: Jacob Maes <jm...@linkedin.com>

Reviewers: Yi Pan (Data Infrastructure) <ni...@gmail.com>, Boris Shkolnik <bo...@apache.org>

Closes #204 from jmakes/samza-1236-tutorial-1


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/bd132538
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/bd132538
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/bd132538

Branch: refs/heads/0.13.0
Commit: bd132538986cde976b24574064e673f376ce3ae6
Parents: 1ffcbc2
Author: Jacob Maes <jm...@linkedin.com>
Authored: Wed May 31 17:23:30 2017 -0700
Committer: vjagadish1989 <jv...@linkedin.com>
Committed: Wed May 31 18:13:02 2017 -0700

----------------------------------------------------------------------
 docs/_layouts/default.html                      |   1 +
 .../introduction/coordination-service.png       | Bin 0 -> 46941 bytes
 .../introduction/execution-plan.png             | Bin 0 -> 211288 bytes
 .../documentation/introduction/layered-arch.png | Bin 0 -> 54953 bytes
 .../wikipedia-execution-plan.png                | Bin 0 -> 97940 bytes
 .../versioned/rest/resources/jobs.md            |   2 +-
 .../versioned/yarn/yarn-host-affinity.md        |   2 +-
 .../versioned/hello-samza-high-level-code.md    | 421 +++++++++++++++
 .../versioned/hello-samza-high-level-yarn.md    | 127 +++++
 .../versioned/hello-samza-high-level-zk.md      | 111 ++++
 docs/learn/tutorials/versioned/index.md         |   9 +
 .../versioned/samza-async-user-guide.md         |   4 +-
 docs/startup/hello-samza/versioned/index.md     |  15 +-
 docs/startup/preview/index.md                   | 530 +++++++++++++++++++
 .../samza/system/kafka/KafkaStreamSpec.java     |   2 +-
 15 files changed, 1214 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/_layouts/default.html
----------------------------------------------------------------------
diff --git a/docs/_layouts/default.html b/docs/_layouts/default.html
index b9a05fa..e4a2d11 100644
--- a/docs/_layouts/default.html
+++ b/docs/_layouts/default.html
@@ -65,6 +65,7 @@
             <ul>
               <li><a href="/startup/hello-samza/{{ navLink }}">Hello Samza</a></li>
               <li><a href="/startup/download">Download</a></li>
+              <li><a href="/startup/preview">Feature Preview</a></li>
             </ul>
 
             <h1><i class="fa fa-book"></i> Learn</h1>

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/img/versioned/learn/documentation/introduction/coordination-service.png
----------------------------------------------------------------------
diff --git a/docs/img/versioned/learn/documentation/introduction/coordination-service.png b/docs/img/versioned/learn/documentation/introduction/coordination-service.png
new file mode 100644
index 0000000..02f30d6
Binary files /dev/null and b/docs/img/versioned/learn/documentation/introduction/coordination-service.png differ

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/img/versioned/learn/documentation/introduction/execution-plan.png
----------------------------------------------------------------------
diff --git a/docs/img/versioned/learn/documentation/introduction/execution-plan.png b/docs/img/versioned/learn/documentation/introduction/execution-plan.png
new file mode 100644
index 0000000..11f5caf
Binary files /dev/null and b/docs/img/versioned/learn/documentation/introduction/execution-plan.png differ

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/img/versioned/learn/documentation/introduction/layered-arch.png
----------------------------------------------------------------------
diff --git a/docs/img/versioned/learn/documentation/introduction/layered-arch.png b/docs/img/versioned/learn/documentation/introduction/layered-arch.png
new file mode 100644
index 0000000..ea12ef5
Binary files /dev/null and b/docs/img/versioned/learn/documentation/introduction/layered-arch.png differ

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/img/versioned/learn/tutorials/hello-samza-high-level/wikipedia-execution-plan.png
----------------------------------------------------------------------
diff --git a/docs/img/versioned/learn/tutorials/hello-samza-high-level/wikipedia-execution-plan.png b/docs/img/versioned/learn/tutorials/hello-samza-high-level/wikipedia-execution-plan.png
new file mode 100644
index 0000000..bb1e88c
Binary files /dev/null and b/docs/img/versioned/learn/tutorials/hello-samza-high-level/wikipedia-execution-plan.png differ

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/learn/documentation/versioned/rest/resources/jobs.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/rest/resources/jobs.md b/docs/learn/documentation/versioned/rest/resources/jobs.md
index 68ce9c2..8282a5d 100644
--- a/docs/learn/documentation/versioned/rest/resources/jobs.md
+++ b/docs/learn/documentation/versioned/rest/resources/jobs.md
@@ -301,7 +301,7 @@ The [SimpleYarnJobProxy](../javadocs/org/apache/samza/rest/proxy/job/SimpleYarnJ
 
 The following is a depiction of the implementation that ships with Samza REST:
 
-![JobsResourceDiagram](/img/{{site.version}}/learn/documentation/rest/JobsResource.png)
+<img src="/img/{{site.version}}/learn/documentation/rest/JobsResource.png" alt="Jobs resource component diagram" style="max-width: 100%; height: auto;" onclick="window.open(this.src)"/>
 
 ## Configuration
 The JobsResource properties should be specified in the same file as the Samza REST configuration. They are specified here for clarity.

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/learn/documentation/versioned/yarn/yarn-host-affinity.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/yarn/yarn-host-affinity.md b/docs/learn/documentation/versioned/yarn/yarn-host-affinity.md
index 14e10cc..fea9522 100644
--- a/docs/learn/documentation/versioned/yarn/yarn-host-affinity.md
+++ b/docs/learn/documentation/versioned/yarn/yarn-host-affinity.md
@@ -45,7 +45,7 @@ ls ${container_working_dir}/state/${store-name}/${task_name}/
 
 This allows the Node Manager's (NM) DeletionService to clean-up the working directory once the application completes or fails. In order to re-use local state store, the state store needs to be persisted outside the scope of NM's deletion service. The cluster administrator should set this location as an environment variable in Yarn - <code>LOGGED\_STORE\_BASE\_DIR</code>.
 
-![samza-host-affinity](/img/{{site.version}}/learn/documentation/yarn/samza-host-affinity.png)
+<img src="/img/{{site.version}}/learn/documentation/yarn/samza-host-affinity.png" alt="Yarn host affinity component diagram" style="max-width: 100%; height: auto;" onclick="window.open(this.src)"/>
 
 Each time a task commits, Samza writes the last materialized offset from the changelog stream to the checksumed file on disk. This is also done on container shutdown. Thus, there is an *OFFSET* file associated with each state stores' changelog partitions, that is consumed by the tasks in the container.
 

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/learn/tutorials/versioned/hello-samza-high-level-code.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/versioned/hello-samza-high-level-code.md b/docs/learn/tutorials/versioned/hello-samza-high-level-code.md
new file mode 100644
index 0000000..6c0526e
--- /dev/null
+++ b/docs/learn/tutorials/versioned/hello-samza-high-level-code.md
@@ -0,0 +1,421 @@
+---
+layout: page
+title: Hello Samza High Level API - Code Walkthrough
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+This tutorial introduces the high level API by showing you how to build wikipedia application from the [hello-samza high level API Yarn tutorial] (hello-samza-high-level-yarn.html). Upon completion of this tutorial, you'll know how to implement and configure a [StreamApplication](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/application/StreamApplication.html). Along the way, you'll see how to use some of the basic operators as well as how to leverage key-value stores and metrics in an app.
+
+The same [hello-samza](https://github.com/apache/samza-hello-samza) project is used for this tutorial as for many of the others. You will clone that project and by the end of the tutorial, you will have implemented a duplicate of the `WikipediaApplication`.
+
+Let's get started.
+
+### Get the Code
+
+Check out the hello-samza project:
+
+{% highlight bash %}
+git clone https://git.apache.org/samza-hello-samza.git hello-samza
+cd hello-samza
+git checkout latest
+{% endhighlight %}
+
+This project already contains implementations of the wikipedia application using both the low-level task API and the high-level API. The low-level task implementations are in the `samza.examples.wikipedia.task` package. The high-level application implementation is in the `samza.examples.wikipedia.application` package.
+
+This tutorial will provide step by step instructions to recreate the existing wikipedia application.
+
+### Introduction to Wikipedia Consumer
+In order to consume events from Wikipedia, the hello-samza project includes a `WikipediaSystemFactory` implementation of the Samza [SystemFactory](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/system/SystemFactory.html) that provides a `WikipediaConsumer`.
+
+The WikipediaConsumer is an implementation of [SystemConsumer](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/system/SystemConsumer.html) that can consume events from Wikipedia. It is also a listener for events from the `WikipediaFeed`. It's important to note that the events received in `onEvent` are of the type `WikipediaFeedEvent`, so we will expect that type for messages on our input streams. For other systems, the messages may come in the form of `byte[]`. In that case you may want to configure a samza [serde](/learn/documentation/{{site.version}}/container/serialization.html) and the application should expect the output type of that serde.
+
+Now that we understand the Wikipedia system and the types of inputs we'll be processing, we can proceed with creating our application.
+
+### Create the Initial Config
+In the hello-samza project, configs are kept in the _src/main/config/_ path. This is where we will add the config for our application.
+Create a new file named _my-wikipedia-application.properties_ in this location.
+
+#### Core Configuration
+Let's start by adding some of the core properties to the file:
+
+{% highlight bash %}
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+app.class=samza.examples.wikipedia.application.MyWikipediaApplication
+app.runner.class=org.apache.samza.runtime.RemoteApplicationRunner
+
+job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
+job.name=my-wikipedia-application
+job.default.system=kafka
+
+yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.tar.gz
+{% endhighlight %}
+
+Be sure to include the Apache header. The project will not compile without it. 
+
+Here's a brief summary of what we configured so far.
+
+* **app.class**: the class that defines the application logic. We will create this class later.
+* **app.runner.class**: the runner implementation which will launch our application. Since we are using YARN, we use `RemoteApplicationRunner` which is required for any cluster-based deployment.
+* **job.factory.class**: the [factory](/learn/documentation/{{site.version}}/jobs/job-runner.html) that will create the runtime instances of our jobs. Since we are using YARN, we want each job to be created as a [YARN job](/learn/documentation/{{site.version}}/jobs/yarn-jobs.html), so we use `YarnJobFactory`
+* **job.name**: the primary identifier for the job.
+* **job.default.system**: the default system to use for input, output, and internal metadata streams. This can be overridden on a per-stream basis. The _kafka_ system will be defined in the next section.
+* **yarn.package.path**: tells YARN where to find the [job package](/learn/documentation/{{site.version}}/jobs/packaging.html) so the Node Managers can download it.
+
+These basic configurations are enough to launch the application on YARN but we haven’t defined any streaming systems for Samza to use, so the application would not process anything.
+
+Next, let's define the streaming systems with which the application will interact. 
+
+#### Define Systems
+This Wikipedia application will consume events from Wikipedia and produce stats to a Kafka topic. We need to define those systems in config before Samza can use them. Add the following lines to the config:
+
+{% highlight bash %}
+systems.wikipedia.samza.factory=samza.examples.wikipedia.system.WikipediaSystemFactory
+systems.wikipedia.host=irc.wikimedia.org
+systems.wikipedia.port=6667
+
+systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
+systems.kafka.consumer.zookeeper.connect=localhost:2181/
+systems.kafka.producer.bootstrap.servers=localhost:9092
+systems.kafka.default.stream.replication.factor=1
+systems.kafka.default.stream.samza.msg.serde=json
+{% endhighlight %}
+
+The above configuration defines 2 systems; one called _wikipedia_ and one called _kafka_.
+
+A factory is required for each system, so the _systems.system-name.samza.system.factory_ property is required for both systems. The other properties are system and use-case specific.
+
+For the _kafka_ system, we set the default replication factor to 1 for all streams because this application is intended for a demo deployment which utilizes a Kafka cluster with only 1 broker, so a replication factor larger than 1 is invalid. The default serde is JSON, which means by default any streams consumed or produced to the _kafka_ system will use a _json_ serde, which we will define in the next section.
+
+The _wikipedia_ system does not need a serde because the `WikipediaConsumer` already produces a usable type.
+
+#### Serdes
+Next, we need to configure the [serdes](/learn/documentation/{{site.version}}/container/serialization.html) we will use for streams and stores in the application.
+{% highlight bash %}
+serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
+serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
+serializers.registry.integer.class=org.apache.samza.serializers.IntegerSerdeFactory
+{% endhighlight %}
+
+The _json_ serde was used for the _kafka_ system above. The _string_ and _integer_ serdes will be used later.
+
+#### Configure Streams
+Samza identifies streams using a unique stream ID. In most cases, the stream ID is the same as the actual stream name. However, if a stream has a name that doesn't match the pattern `[A-Za-z0-9_-]+`, we need to configure a separate _physical.name_ to associate the actual stream name with a legal stream ID. The Wikipedia channels we will consume have a '#' character in the names. So for each of them we must pick a legal stream ID and then configure the physical name to match the channel.
+
+Samza uses the _job.default.system_ for any streams that do not explicitly specify a system. In the previous sections, we defined 2 systems, _wikipedia_ and _kafka_, and we configured _kafka_ as the default. To understand why, let's look at the streams and how Samza will use them.
+
+For this app, Samza will:
+
+1. Consume from input streams
+2. Produce to an output stream and a metrics stream
+3. Both produce and consume from job-coordination, checkpoint, and changelog streams
+
+While the _wikipedia_ system is necessary for case 1, it does not support producers (we can't write Samza output to Wikipedia), which are needed for cases 2-3. So it is more convenient to use _kafka_ as the default system. We can then explicitly configure the input streams to use the _wikipedia_ system.
+
+{% highlight bash %}
+streams.en-wikipedia.samza.system=wikipedia
+streams.en-wikipedia.samza.physical.name=#en.wikipedia
+
+streams.en-wiktionary.samza.system=wikipedia
+streams.en-wiktionary.samza.physical.name=#en.wiktionary
+
+streams.en-wikinews.samza.system=wikipedia
+streams.en-wikinews.samza.physical.name=#en.wikinews
+{% endhighlight %}
+
+The above configurations declare 3 streams with IDs, _en-wikipedia_, _en-wiktionary_, and _en-wikinews_. It associates each stream with the _wikipedia_ system we defined earlier and set the physical name to the corresponding Wikipedia channel. 
+
+Since all the Kafka streams for cases 2-3 are on the default system and do not include special characters in their names, we do not need to configure them explicitly.
+
+### Create a StreamApplication
+
+With the core configuration settled, we turn our attention to code.
+
+### Define Application Logic
+Let's create the application class we configured above. The next 8 sections walk you through writing the code for the Wikipedia application.
+
+Create a new class named `MyWikipediaApplication` in the `samza.examples.wikipedia.application` package. The class must implement [StreamApplication](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/application/StreamApplication.html) and should look like this:
+
+{% highlight java %}
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package samza.examples.wikipedia.application;
+
+import org.apache.samza.application.StreamApplication;
+import org.apache.samza.config.Config;
+import org.apache.samza.operators.StreamGraph;
+
+public class MyWikipediaApplication implements StreamApplication{
+  @Override
+  public void init(StreamGraph streamGraph, Config config) {
+    
+  }
+}
+{% endhighlight %}
+
+Be sure to include the Apache header. The project will not compile without it.
+
+The [init](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/application/StreamApplication.html#init-org.apache.samza.operators.StreamGraph-org.apache.samza.config.Config-) method is where the application logic is defined. The [Config](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/config/Config.html) argument is the runtime configuration loaded from the properties file we defined earlier. The [StreamGraph](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/StreamGraph.html) argument provides methods to declare input streams. You can then invoke a number of flexible operations on those streams. The result of each operation is another stream, so you can keep chaining more operations or direct the result to an output stream.
+
+Next, we will declare the input streams for the Wikipedia application.
+
+#### Inputs
+The Wikipedia application consumes events from three channels. Let's declare each of those channels as an input streams via the [StreamGraph](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/StreamGraph.html) in the [init](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/application/StreamApplication.html#init-org.apache.samza.operators.StreamGraph-org.apache.samza.config.Config-) method.
+{% highlight java %}
+MessageStream<WikipediaFeedEvent> wikipediaEvents = streamGraph.getInputStream("en-wikipedia", (k, v) -> (WikipediaFeedEvent) v);
+MessageStream<WikipediaFeedEvent> wiktionaryEvents = streamGraph.getInputStream("en-wiktionary", (k, v) -> (WikipediaFeedEvent) v);
+MessageStream<WikipediaFeedEvent> wikiNewsEvents = streamGraph.getInputStream("en-wikinews", (k, v) -> (WikipediaFeedEvent) v);
+{% endhighlight %}
+
+The first argument to the [getInputStream](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/StreamGraph.html#getInputStream-java.lang.String-java.util.function.BiFunction-) method is the stream ID. Each ID must match the corresponding stream IDs we configured earlier.
+
+The second argument is the *message builder*. It converts the input key and message to the appropriate type. In this case, we don't have a key and want to sent the events as-is, so we have a very simple builder that just forwards the input value.
+
+Note the streams are all MessageStreams of type WikipediaFeedEvent. [MessageStream](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html) is the in-memory representation of a stream in Samza. It uses generics to ensure type safety across the streams and operations. We knew the WikipediaFeedEvent type by inspecting the WikipediaConsumer above and we made it explicit with the cast on the output of the MessageBuilder. If our inputs used a serde, we would know the type based on which serde is configured for the input streams.
+
+#### Merge
+We'd like to use the same processing logic for all three input streams, so we will use the [mergeAll](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html#mergeAll-java.util.Collection-) operator to merge them together. Note: this is not the same as a [join](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html#join-org.apache.samza.operators.MessageStream-org.apache.samza.operators.functions.JoinFunction-java.time.Duration-) because we are not associating events by key. We are simply combining three streams into one, like a union.
+
+Add the following snippet to the [init](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/application/StreamApplication.html#init-org.apache.samza.operators.StreamGraph-org.apache.samza.config.Config-) method. It merges all the input streams into a new one called _allWikipediaEvents_
+{% highlight java %}
+MessageStream<WikipediaFeed.WikipediaFeedEvent> allWikipediaEvents = MessageStream.mergeAll(ImmutableList.of(wikipediaEvents, wiktionaryEvents, wikiNewsEvents));
+{% endhighlight %}
+
+Note there is a [merge](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html#merge-java.util.Collection-) operator instance method on MessageStream, but the static [mergeAll](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html#mergeAll-java.util.Collection-) method is a more convenient alternative if you need to merge many streams.
+
+#### Parse
+The next step is to parse the events and extract some information. We will use the pre-existing `WikipediaParser.parseEvent()' method to do this. The parser extracts some flags we want to monitor as well as some metadata about the event. Inspect the method signature. The input is a WikipediaFeedEvents and the output is a Map<String, Object>. These types will be reflected in the types of the streams before and after the operation.
+
+In the [init](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/application/StreamApplication.html#init-org.apache.samza.operators.StreamGraph-org.apache.samza.config.Config-) method, invoke the [map](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html#map-org.apache.samza.operators.functions.MapFunction-) operation on `allWikipediaEvents`, passing the `WikipediaParser::parseEvent` method reference as follows:
+
+{% highlight java %}
+allWikipediaEvents.map(WikipediaParser::parseEvent);
+{% endhighlight %}
+
+#### Window
+Now that we have the relevant information extracted, let's perform some aggregations over a 10-second [window](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/windows/Window.html).
+
+First, we need a container class for statistics we want to track. Add the following static class after the [init](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/application/StreamApplication.html#init-org.apache.samza.operators.StreamGraph-org.apache.samza.config.Config-) method.
+{% highlight java %}
+private static class WikipediaStats {
+  int edits = 0;
+  int byteDiff = 0;
+  Set<String> titles = new HashSet<String>();
+  Map<String, Integer> counts = new HashMap<String, Integer>();
+}
+{% endhighlight %}
+
+Now we need to define the logic to aggregate the stats over the duration of the window. To do this, we implement [FoldLeftFunction](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/functions/FoldLeftFunction.html) by adding the following class after the `WikipediaStats` class:
+{% highlight java %}
+private class WikipediaStatsAggregator implements FoldLeftFunction<Map<String, Object>, WikipediaStats> {
+
+  @Override
+  public WikipediaStats apply(Map<String, Object> edit, WikipediaStats stats) {
+    // Update window stats
+    stats.edits++;
+    stats.byteDiff += (Integer) edit.get("diff-bytes");
+    stats.titles.add((String) edit.get("title"));
+
+    Map<String, Boolean> flags = (Map<String, Boolean>) edit.get("flags");
+    for (Map.Entry<String, Boolean> flag : flags.entrySet()) {
+      if (Boolean.TRUE.equals(flag.getValue())) {
+        stats.counts.compute(flag.getKey(), (k, v) -> v == null ? 0 : v + 1);
+      }
+    }
+
+    return stats;
+  }
+}
+{% endhighlight %}
+
+Note: the type parameters for [FoldLeftFunction](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/functions/FoldLeftFunction.html) reflect the upstream data type and the window value type, respectively. In our case, the upstream type is the output of the parser and the window value is our `WikipediaStats` class.
+
+Finally, we can define our [window](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/windows/Window.html) back in the [init](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/application/StreamApplication.html#init-org.apache.samza.operators.StreamGraph-org.apache.samza.config.Config-) method by chaining the result of the parser:
+{% highlight java %}
+allWikipediaEvents.map(WikipediaParser::parseEvent)
+        .window(Windows.tumblingWindow(Duration.ofSeconds(10), WikipediaStats::new, new WikipediaStatsAggregator()));
+{% endhighlight %}
+
+This defines an unkeyed [tumbling window](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/windows/Windows.html) that spans 10s, which instantiates a new `WikipediaStats` object at the beginning of each window and aggregates the stats using `WikipediaStatsAggregator`.
+
+The output of the window is a [WindowPane](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/windows/WindowPane.html) with a key and value. Since we used an unkeyed tumbling window, the key is `Void`. The value is our `WikipediaStats` object.
+
+#### Output
+We want to use a JSON serializer to output the window values to Kafka, so we will do one more [map](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html#map-org.apache.samza.operators.functions.MapFunction-) to format the output.
+
+First, let's define the method to format the stats as a `Map<String, String>` so the _json_ serde can handle it. Paste the following after the aggregator class:
+{% highlight java %}
+private Map<String, Integer> formatOutput(WindowPane<Void, WikipediaStats> statsWindowPane) {
+  WikipediaStats stats = statsWindowPane.getMessage();
+
+  Map<String, Integer> counts = new HashMap<String, Integer>(stats.counts);
+  counts.put("edits", stats.edits);
+  counts.put("bytes-added", stats.byteDiff);
+  counts.put("unique-titles", stats.titles.size());
+
+  return counts;
+}
+{% endhighlight %}
+
+Now, we can invoke the method by adding another [map](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html#map-org.apache.samza.operators.functions.MapFunction-) operation to the chain in [init](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/application/StreamApplication.html#init-org.apache.samza.operators.StreamGraph-org.apache.samza.config.Config-). The operator chain should now look like this:
+{% highlight java %}
+allWikipediaEvents.map(WikipediaParser::parseEvent)
+        .window(Windows.tumblingWindow(Duration.ofSeconds(10), WikipediaStats::new, new WikipediaStatsAggregator()))
+        .map(this::formatOutput);
+{% endhighlight %}
+
+Next we need to get the output stream to which we will send the stats. Insert the following line below the creation of the 3 input streams:
+{% highlight java %}
+OutputStream<Void, Map<String, Integer>, Map<String, Integer>>
+        wikipediaStats = streamGraph.getOutputStream("wikipedia-stats", m -> null, m -> m);
+{% endhighlight %}
+
+The [OutputStream](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/OutputStream.html) is parameterized by 3 types; the key type for the output, the value type for the output, and upstream type.
+
+The first parameter of [getOutputStream](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/StreamGraph.html#getOutputStream-java.lang.String-java.util.function.Function-java.util.function.Function-) is the output stream ID. We will use _wikipedia-stats_ and since it contains no special characters, we won't bother configuring a physical name so Samza will use the stream ID as the topic name.
+
+The second and third parameters are the *key extractor* and the *message extractor*, respectively. We have no key, so the *key extractor* simply produces null. The *message extractor* simply passes the message because it's already the correct type for the _json_ serde. Note: we could have skipped the previous [map](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html#map-org.apache.samza.operators.functions.MapFunction-) operator and invoked our formatter here, but we kept them separate for pedagogical purposes.
+
+Finally, we can send our output to the output stream using the [sendTo](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html#sendTo-org.apache.samza.operators.OutputStream-) operator:
+{% highlight java %}
+allWikipediaEvents.map(WikipediaParser::parseEvent)
+        .window(Windows.tumblingWindow(Duration.ofSeconds(10), WikipediaStats::new, new WikipediaStatsAggregator()))
+        .map(this::formatOutput)
+        .sendTo(wikipediaStats);
+{% endhighlight %}
+
+Tip: Because the MessageStream type information is preserved in the operator chain, it is often easier to define the OutputStream inline with the [sendTo](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/MessageStream.html#sendTo-org.apache.samza.operators.OutputStream-) operator and then refactor it for readability. That way you don't have to hunt down the types.
+
+#### KVStore
+We now have an operational Wikipedia application which provides stats aggregated over a 10 second interval. One of those stats is a count of the number of edits within the 10s window. But what if we want to keep an additional durable counter of the total edits?
+
+We will do this by keeping a separate count outside the window and persisting it in a [KeyValueStore](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/storage/kv/KeyValueStore.html).
+
+We start by defining the store in the config file:
+{% highlight bash %}
+stores.wikipedia-stats.factory=org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory
+stores.wikipedia-stats.changelog=kafka.wikipedia-stats-changelog
+stores.wikipedia-stats.key.serde=string
+stores.wikipedia-stats.msg.serde=integer
+{% endhighlight %}
+
+These properties declare a [RocksDB](http://rocksdb.org/) key-value store named "wikipedia-stats". The store is replicated to a changelog stream called "wikipedia-stats-changelog" on the _kafka_ system for durability. It uses the _string_ and _integer_ serdes you defined earlier for keys and values respectively.
+
+Next, we add a total count member variable to the `WikipediaStats` class:
+{% highlight java %}
+int totalEdits = 0;
+{% endhighlight %}
+
+To use the store in the application, we need to get it from the [TaskContext](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/task/TaskContext.html). Also, since we want to emit the total edit count along with the window edit count, it's easiest to update both of them in our aggregator. Declare the store as a member variable of the `WikipediaStatsAggregator` class:
+{% highlight java %}
+private KeyValueStore<String, Integer> store;
+{% endhighlight %}
+
+Then override the [init](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/operators/functions/InitableFunction.html#init-org.apache.samza.config.Config-org.apache.samza.task.TaskContext-) method in `WikipediaStatsAggregator` to initialize the store.
+{% highlight java %}
+@Override
+public void init(Config config, TaskContext context) {
+  store = (KeyValueStore<String, Integer>) context.getStore("wikipedia-stats");
+}
+{% endhighlight %}
+
+Update and persist the counter in the `apply` method.
+{% highlight java %}
+Integer editsAllTime = store.get("count-edits-all-time");
+if (editsAllTime == null) editsAllTime = 0;
+editsAllTime++;
+store.put("count-edits-all-time", editsAllTime);
+stats.totalEdits = editsAllTime;
+{% endhighlight %}
+
+Finally, update the `MyWikipediaApplication#formatOutput` method to include the total counter.
+{% highlight java %}
+counts.put("edits-all-time", stats.totalEdits);
+{% endhighlight %}
+
+#### Metrics
+Lastly, let's add a metric to the application which counts the number of repeat edits each topic within the window interval.
+
+As with the key-value store, we must first define the metrics reporters in the config file.
+{% highlight bash %}
+metrics.reporters=snapshot,jmx
+metrics.reporter.snapshot.class=org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory
+metrics.reporter.snapshot.stream=kafka.metrics
+metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
+{% endhighlight %}
+
+The above properties define 2 metrics reporters. The first emits metrics to a _metrics_ topic on the _kafka_ system. The second reporter emits metrics to JMX.
+
+In the WikipediaStatsAggregator, declare a counter member variable.
+{% highlight java %}
+private Counter repeatEdits;
+{% endhighlight %}
+
+Then add the following to the `WikipediaStatsAggregator#init` method to initialize the counter.
+{% highlight java %}
+repeatEdits = context.getMetricsRegistry().newCounter("edit-counters", "repeat-edits");
+{% endhighlight %}
+
+Update and persist the counter from the `apply` method.
+{% highlight java %}
+boolean newTitle = stats.titles.add((String) edit.get("title"));
+
+if (!newTitle) {
+  repeatEdits.inc();
+  log.info("Frequent edits for title: {}", edit.get("title"));
+}
+{% endhighlight %}
+
+#### Run and View Plan
+You can set up the grid and run the application using the same instructions from the [hello samza high level API Yarn tutorial] (hello-samza-high-level-yarn.html). The only difference is to replace the `wikipedia-application.properties` config file in the _config-path_ command line parameter with `my-wikipedia-application.properties`
+
+### Summary
+Congratulations! You have built and executed a Wikipedia stream application on Samza using the high level API. The final application should be directly comparable to the pre-existing `WikipediaApplication` in the project.
+
+You can provide feedback on this tutorial in the [dev mailing list](mailto:dev@samza.apache.org).

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md b/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md
new file mode 100644
index 0000000..1ad40df
--- /dev/null
+++ b/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md
@@ -0,0 +1,127 @@
+---
+layout: page
+title: Hello Samza High Level API - YARN Deployment
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+The [hello-samza](https://github.com/apache/samza-hello-samza) project is an example project designed to help you run your first Samza application. It has examples of applications using the low level task API as well as the high level API.
+
+This tutorial demonstrates a simple wikipedia application created with the high level API. The [Hello Samza tutorial] (/startup/hello-samza/{{site.version}}/index.html) is the low-level analog to this tutorial. It demonstrates the same logic but is created with the task API. The tutorials are designed to be as similar as possible. The primary differences are that with the high level API we accomplish the equivalent of 3 separate low-level jobs with a single application, we skip the intermediate topics for simplicity, and we can visualize the execution plan after we start the application.
+
+### Get the Code
+
+Check out the hello-samza project:
+
+{% highlight bash %}
+git clone https://git.apache.org/samza-hello-samza.git hello-samza
+cd hello-samza
+git checkout latest
+{% endhighlight %}
+
+This project contains everything you'll need to run your first Samza application.
+
+### Start a Grid
+
+A Samza grid usually comprises three different systems: [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html), [Kafka](http://kafka.apache.org/), and [ZooKeeper](http://zookeeper.apache.org/). The hello-samza project comes with a script called "grid" to help you setup these systems. Start by running:
+
+{% highlight bash %}
+./bin/grid bootstrap
+{% endhighlight %}
+
+This command will download, install, and start ZooKeeper, Kafka, and YARN. It will also check out the latest version of Samza and build it. All package files will be put in a sub-directory called "deploy" inside hello-samza's root folder.
+
+If you get a complaint that JAVA_HOME is not set, then you'll need to set it to the path where Java is installed on your system.
+
+Once the grid command completes, you can verify that YARN is up and running by going to [http://localhost:8088](http://localhost:8088). This is the YARN UI.
+
+### Build a Samza Application Package
+
+Before you can run a Samza application, you need to build a package for it. This package is what YARN uses to deploy your apps on the grid.
+
+NOTE: if you are building from the latest branch of hello-samza project, make sure that you run the following step from your local Samza project first:
+
+{% highlight bash %}
+./gradlew publishToMavenLocal
+{% endhighlight %}
+
+Then, you can continue w/ the following command in hello-samza project:
+
+{% highlight bash %}
+mvn clean package
+mkdir -p deploy/samza
+tar -xvf ./target/hello-samza-0.13.0-SNAPSHOT-dist.tar.gz -C deploy/samza
+{% endhighlight %}
+
+### Run a Samza Application
+
+After you've built your Samza package, you can start the app on the grid using the run-app.sh script.
+
+{% highlight bash %}
+./deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-application.properties
+{% endhighlight %}
+
+The app will do all of the following:
+
+1. Consume 3 feeds of real-time edits from Wikipedia
+3. Parse the events to extract information about the size of the edit, who made the change, etc.
+4. Calculate counts, every ten seconds, for all edits that were made during that window 
+5. Output the counts to the wikipedia-stats topic
+
+For details about how the app works, take a look at the [code walkthrough](hello-samza-high-level-code.html).
+
+Give the job a minute to startup, and then tail the Kafka topic:
+
+{% highlight bash %}
+./deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-stats
+{% endhighlight %}
+
+The messages in the stats topic look like this:
+
+{% highlight json %}
+{"is-talk":2,"bytes-added":5276,"edits":13,"unique-titles":13}
+{"is-bot-edit":1,"is-talk":3,"bytes-added":4211,"edits":30,"unique-titles":30,"is-unpatrolled":1,"is-new":2,"is-minor":7}
+{"bytes-added":3180,"edits":19,"unique-titles":19,"is-unpatrolled":1,"is-new":1,"is-minor":3}
+{"bytes-added":2218,"edits":18,"unique-titles":18,"is-unpatrolled":2,"is-new":2,"is-minor":3}
+{% endhighlight %}
+
+Pretty neat, right? Now, check out the YARN UI again ([http://localhost:8088](http://localhost:8088)). This time around, you'll see your Samza job is running!
+
+### View the Execution Plan
+Each application goes through an execution planner and you can visualize the execution plan after starting the job by opening the following file in a browser
+{% highlight bash %}
+deploy/samza/bin/plan.html
+{% endhighlight %}
+
+This plan will make more sense after the [code walkthrough](hello-samza-high-level-code.html). For now, just take note that this visualization is available and it is useful for visibility into the structure of the application. For this tutorial, the plan should look something like this:
+
+<img src="/img/{{site.version}}/learn/tutorials/hello-samza-high-level/wikipedia-execution-plan.png" alt="Execution plan" style="max-width: 100%; height: auto;" onclick="window.open(this.src)"/>
+
+
+### Shutdown
+
+To shutdown the app, use the same _run-app.sh_ script with an extra _--operation=kill_ argument
+{% highlight bash %}
+./deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-application.properties --operation=kill
+{% endhighlight %}
+
+After you're done, you can clean everything up using the same grid script.
+
+{% highlight bash %}
+./bin/grid stop all
+{% endhighlight %}
+
+Congratulations! You've now setup a local grid that includes YARN, Kafka, and ZooKeeper, and run a Samza application on it. Curious how this application was built? See the [code walk-through](hello-samza-high-level-code.html).

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/learn/tutorials/versioned/hello-samza-high-level-zk.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/versioned/hello-samza-high-level-zk.md b/docs/learn/tutorials/versioned/hello-samza-high-level-zk.md
new file mode 100644
index 0000000..9fd947b
--- /dev/null
+++ b/docs/learn/tutorials/versioned/hello-samza-high-level-zk.md
@@ -0,0 +1,111 @@
+---
+layout: page
+title: Hello Samza High Level API - Zookeeper Deployment
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+The [hello-samza](https://github.com/apache/samza-hello-samza) project is designed to get started with your first Samza job.
+In this tutorial, we will learn how to run a Samza application using ZooKeeper deployment model.
+
+### Get the Code
+
+Let's get started by cloning the hello-samza project
+
+{% highlight bash %}
+git clone https://git.apache.org/samza-hello-samza.git hello-samza
+cd hello-samza
+git checkout latest
+{% endhighlight %}
+
+The project comes up with numerous examples and for this tutorial, we will pick the Wikipedia application.
+
+### Setting up the Deployment Environment
+
+For our Wikipedia application, we require two systems: [Kafka](http://kafka.apache.org/) and [ZooKeeper](http://zookeeper.apache.org/). The hello-samza project comes with a script called "grid" to help with the environment setup
+
+{% highlight bash %}
+./bin/grid standalone
+{% endhighlight %}
+
+This command will download, install, and start ZooKeeper and Kafka. It will also check out the latest version of Samza and build it. All package files will be put in a sub-directory called "deploy" inside hello-samza's root folder.
+
+If you get a complaint that JAVA_HOME is not set, then you'll need to set it to the path where Java is installed on your system.
+
+### Building the Hello Samza Project
+
+NOTE: if you are building from the latest branch of hello-samza project and want to use your local copy of samza, make sure that you run the following step from your local Samza project first
+
+{% highlight bash %}
+./gradlew publishToMavenLocal
+{% endhighlight %}
+
+With the environment setup complete, let us move on to building the hello-samza project. Execute the following commands:
+
+{% highlight bash %}
+mvn clean package
+mkdir -p deploy/samza
+tar -xvf ./target/hello-samza-0.13.0-SNAPSHOT-dist.tar.gz -C deploy/samza
+{% endhighlight %}
+
+We are now all set to deploy the application locally.
+
+### Running the Wikipedia application
+
+In order to run the application, we will use the *run-wikipedia-zk-application* script.
+
+{% highlight bash %}
+./deploy/samza/bin/run-wikipedia-zk-application.sh
+{% endhighlight %}
+
+The above command executes the helper script which invokes the *WikipediaZkLocalApplication* main class with the appropriate job configurations as command line arguments. The main class is an application wrapper
+that initializes the application and passes it to the local runner for execution. It is blocking and waits for the *LocalApplicationRunner* to finish.
+
+To run your own application using ZooKeeper deployment model, you would need something similar to *WikipediaZkLocalApplication* class that initializes your application
+and uses the *LocalApplicationRunner* to run it. To learn more about the internals checkout [deployment-models](/startup/preview/) documentation and the [configurations](/learn/documentation/{{site.version}}/jobs/configuration-table.html) table.
+
+Getting back to our example, the application consumes a feed of real-time edits from Wikipedia, and produces them to a Kafka topic called "wikipedia-stats". Give the job a minute to startup, and then tail the Kafka topic. To do so, run the following command:
+
+{% highlight bash %}
+./deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-stats
+{% endhighlight %}
+
+The messages in the stats topic should look like the sample below:
+
+{% highlight json %}
+{"is-talk":2,"bytes-added":5276,"edits":13,"unique-titles":13}
+{"is-bot-edit":1,"is-talk":3,"bytes-added":4211,"edits":30,"unique-titles":30,"is-unpatrolled":1,"is-new":2,"is-minor":7}
+{"bytes-added":3180,"edits":19,"unique-titles":19,"is-unpatrolled":1,"is-new":1,"is-minor":3}
+{"bytes-added":2218,"edits":18,"unique-titles":18,"is-unpatrolled":2,"is-new":2,"is-minor":3}
+{% endhighlight %}
+
+Excellent! Now that the job is running, open the *plan.html* file under *deploy/samza/bin* directory to take a look at the execution plan for the Wikipedia application.
+The execution plan is a colorful graphic representing various stages of your application and how they are connected. Here is a sample plan visualization:
+
+<img src="/img/{{site.version}}/learn/tutorials/hello-samza-high-level/wikipedia-execution-plan.png" alt="Execution plan" style="max-width: 100%; height: auto;" onclick="window.open(this.src)"/>
+
+
+### Shutdown
+
+The Wikipedia application can be shutdown by terminating the *run-wikipedia-zk-application* script.
+We can use the *grid* script to tear down the local environment ([Kafka](http://kafka.apache.org/) and [Zookeeper](http://zookeeper.apache.org/)).
+
+{% highlight bash %}
+bin/grid stop all
+{% endhighlight %}
+
+Congratulations! You've now successfully run a Samza application using ZooKeeper deployment model. Next up, check out the [deployment-models](/startup/preview/) and [high level API](/startup/preview.html) pages.

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/learn/tutorials/versioned/index.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/versioned/index.md b/docs/learn/tutorials/versioned/index.md
index 6d6295f..a9ac6a7 100644
--- a/docs/learn/tutorials/versioned/index.md
+++ b/docs/learn/tutorials/versioned/index.md
@@ -18,6 +18,15 @@ title: Tutorials
    See the License for the specific language governing permissions and
    limitations under the License.
 -->
+<!-- Uncomment after these features are fully released
+[[Preview] Hello Samza High Level API Zookeeper Deployment](hello-samza-high-level-zk.html)
+
+[[Preview] Hello Samza High Level API Yarn Deployment](hello-samza-high-level-yarn.html)
+
+[[Preview] Hello Samza High Level API Code](hello-samza-high-level-code.html)
+-->
+
+[Hello Samza Low Level API Yarn Deployment](/startup/hello-samza/{{site.version}}/)
 
 [Remote Debugging with Samza](remote-debugging-samza.html)
 

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/learn/tutorials/versioned/samza-async-user-guide.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/versioned/samza-async-user-guide.md b/docs/learn/tutorials/versioned/samza-async-user-guide.md
index 30865a8..3e3314c 100644
--- a/docs/learn/tutorials/versioned/samza-async-user-guide.md
+++ b/docs/learn/tutorials/versioned/samza-async-user-guide.md
@@ -60,7 +60,7 @@ job.container.thread.pool.size=16
 
 ### Asynchronous Process with AsyncStreamTask API
 
-If your job process is asynchronous, e.g. making non-blocking remote IO calls, [AsyncStreamTask](javadocs/org/apache/samza/task/AsyncStreamTask.html) interface provides the support for it. In the following example AsyncRestTask makes asynchronous rest call and triggers callback once it's complete. 
+If your job process is asynchronous, e.g. making non-blocking remote IO calls, [AsyncStreamTask](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/task/AsyncStreamTask.html) interface provides the support for it. In the following example AsyncRestTask makes asynchronous rest call and triggers callback once it's complete.
 
 {% highlight java %}
 public class AsyncRestTask implements AsyncStreamTask, InitableTask, ClosableTask {
@@ -98,7 +98,7 @@ public class AsyncRestTask implements AsyncStreamTask, InitableTask, ClosableTas
 }
 {% endhighlight %}
 
-In the above example, the process is not complete when processAsync() returns. In the callback thread from Jersey client, we trigger [TaskCallback](javadocs/org/apache/samza/task/TaskCallback.html) to indicate the process is done. In order to make sure the callback will be triggered within certain time interval, e.g. 5 seconds, you can config the following property:
+In the above example, the process is not complete when processAsync() returns. In the callback thread from Jersey client, we trigger [TaskCallback](/learn/documentation/{{site.version}}/api/javadocs/org/apache/samza/task/TaskCallback.html) to indicate the process is done. In order to make sure the callback will be triggered within certain time interval, e.g. 5 seconds, you can config the following property:
 
 {% highlight jproperties %}
 # Timeout for processAsync() callback. When the timeout happens, it will throw a TaskCallbackTimeoutException and shut down the container.

http://git-wip-us.apache.org/repos/asf/samza/blob/bd132538/docs/startup/hello-samza/versioned/index.md
----------------------------------------------------------------------
diff --git a/docs/startup/hello-samza/versioned/index.md b/docs/startup/hello-samza/versioned/index.md
index 89b7ab9..537516b 100644
--- a/docs/startup/hello-samza/versioned/index.md
+++ b/docs/startup/hello-samza/versioned/index.md
@@ -18,7 +18,7 @@ title: Hello Samza
    See the License for the specific language governing permissions and
    limitations under the License.
 -->
-The [hello-samza](https://github.com/apache/samza-hello-samza) project is a stand-alone project designed to help you run your first Samza job.
+The [hello-samza](https://github.com/apache/samza-hello-samza) project is an example project designed to help you run your first Samza job.
 
 ### Get the Code
 
@@ -66,10 +66,10 @@ tar -xvf ./target/hello-samza-0.13.0-SNAPSHOT-dist.tar.gz -C deploy/samza
 
 ### Run a Samza Job
 
-After you've built your Samza package, you can start a job on the grid using the run-job.sh script.
+After you've built your Samza package, you can start a job on the grid using the run-app.sh script.
 
 {% highlight bash %}
-deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
+deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
 {% endhighlight %}
 
 The job will consume a feed of real-time edits from Wikipedia, and produce them to a Kafka topic called "wikipedia-raw". Give the job a minute to startup, and then tail the Kafka topic:
@@ -87,8 +87,8 @@ If you can not see any output from Kafka consumer, you may have connection probl
 Let's calculate some statistics based on the messages in the wikipedia-raw topic. Start two more jobs:
 
 {% highlight bash %}
-deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-parser.properties
-deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-stats.properties
+deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-parser.properties
+deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-stats.properties
 {% endhighlight %}
 
 The first job (wikipedia-parser) parses the messages in wikipedia-raw, and extracts information about the size of the edit, who made the change, etc. You can take a look at its output with:
@@ -116,6 +116,11 @@ If you check the YARN UI, again, you'll see that all three jobs are now listed.
 
 ### Shutdown
 
+To shutdown one of the jobs, use the same script with an extra '--operation=kill' argument
+{% highlight bash %}
+deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties --operation=kill
+{% endhighlight %}
+
 After you're done, you can clean everything up using the same grid script.
 
 {% highlight bash %}