You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by ya...@apache.org on 2014/12/05 19:52:35 UTC

svn commit: r1643389 [23/27] - in /incubator/samza/site: ./ archive/ community/ contribute/ img/0.8/ img/0.8/learn/ img/0.8/learn/documentation/ img/0.8/learn/documentation/comparisons/ img/0.8/learn/documentation/container/ img/0.8/learn/documentation...

Added: incubator/samza/site/learn/documentation/0.8/index.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.8/index.html?rev=1643389&view=auto
==============================================================================
--- incubator/samza/site/learn/documentation/0.8/index.html (added)
+++ incubator/samza/site/learn/documentation/0.8/index.html Fri Dec  5 18:52:32 2014
@@ -0,0 +1,263 @@
+<!DOCTYPE html>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Samza - Documentation</title>
+    <link href='/css/ropa-sans.css' rel='stylesheet' type='text/css'/>
+    <link href="/css/bootstrap.min.css" rel="stylesheet"/>
+    <link href="/css/font-awesome.min.css" rel="stylesheet"/>
+    <link href="/css/main.css" rel="stylesheet"/>
+    <link href="/css/syntax.css" rel="stylesheet"/>
+    <link rel="icon" type="image/png" href="/img/samza-icon.png">
+    <script src="/js/jquery-1.11.1.min.js"></script>
+  </head>
+  <body>
+    <div class="wrapper">
+      <div class="wrapper-content">
+
+        <div class="masthead">
+          <div class="container">
+            <div class="masthead-logo">
+              <a href="/" class="logo">samza</a>
+            </div>
+            <div class="masthead-icons">
+              <div class="pull-right">
+                <a href="/startup/download"><i class="fa fa-arrow-circle-o-down masthead-icon"></i></a>
+                <a href="https://git-wip-us.apache.org/repos/asf?p=incubator-samza.git;a=tree" target="_blank"><i class="fa fa-code masthead-icon" style="font-weight: bold;"></i></a>
+                <a href="https://twitter.com/samzastream" target="_blank"><i class="fa fa-twitter masthead-icon"></i></a>
+                <!-- this icon only shows in versioned pages -->
+                
+                  
+                    
+                  
+                  <a href="http://samza.incubator.apache.org/learn/documentation/latest/index.html"><i id="switch-version-button"></i></a>
+                   <!-- links for the navigation bar -->
+                
+
+              </div>
+            </div>
+          </div><!-- /.container -->
+        </div>
+
+        <div class="container">
+          <div class="menu">
+            <h1><i class="fa fa-rocket"></i> Getting Started</h1>
+            <ul>
+              <li><a href="/startup/hello-samza/0.8">Hello Samza</a></li>
+              <li><a href="/startup/download">Download</a></li>
+            </ul>
+
+            <h1><i class="fa fa-book"></i> Learn</h1>
+            <ul>
+              <li><a href="/learn/documentation/0.8">Documentation</a></li>
+              <li><a href="/learn/documentation/0.8/jobs/configuration-table.html">Configuration</a></li>
+              <li><a href="/learn/documentation/0.8/api/javadocs/">Javadocs</a></li>
+              <li><a href="/learn/tutorials/0.8">Tutorials</a></li>
+              <li><a href="http://wiki.apache.org/samza/FAQ">FAQ</a></li>
+              <li><a href="http://wiki.apache.org/samza">Wiki</a></li>
+              <li><a href="http://wiki.apache.org/samza/PapersAndTalks">Papers &amp; Talks</a></li>
+              <li><a href="http://blogs.apache.org/samza">Blog</a></li>
+            </ul>
+
+            <h1><i class="fa fa-comments"></i> Community</h1>
+            <ul>
+              <li><a href="/community/mailing-lists.html">Mailing Lists</a></li>
+              <li><a href="/community/irc.html">IRC</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/SAMZA">Bugs</a></li>
+              <li><a href="http://wiki.apache.org/samza/PoweredBy">Powered by</a></li>
+              <li><a href="http://wiki.apache.org/samza/Ecosystem">Ecosystem</a></li>
+              <li><a href="/community/committers.html">Committers</a></li>
+            </ul>
+
+            <h1><i class="fa fa-code"></i> Contribute</h1>
+            <ul>
+              <li><a href="/contribute/rules.html">Rules</a></li>
+              <li><a href="/contribute/coding-guide.html">Coding Guide</a></li>
+              <li><a href="/contribute/projects.html">Projects</a></li>
+              <li><a href="/contribute/design-documents.html">Design Documents</a></li>
+              <li><a href="/contribute/code.html">Code</a></li>
+              <li><a href="https://reviews.apache.org/groups/samza">Review Board</a></li>
+              <li><a href="/contribute/tests.html">Tests</a></li>
+              <li><a href="/contribute/disclaimer.html">Disclaimer</a></li>
+            </ul>
+
+            <h1><i class="fa fa-history"></i> Archive</h1>
+            <ul>
+              <li><a href="/archive/index.html#latest">latest</a></li>
+              <li><a href="/archive/index.html#08">0.8</a></li>
+              <li><a href="/archive/index.html#07">0.7</a></li>
+            </ul>
+          </div>
+
+          <div class="content">
+            <!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<h2>Documentation</h2>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<h4>Introduction</h4>
+
+<ul class="documentation-list">
+  <li><a href="introduction/background.html">Background</a></li>
+  <li><a href="introduction/concepts.html">Concepts</a></li>
+  <li><a href="introduction/architecture.html">Architecture</a></li>
+</ul>
+
+<h4>Comparisons</h4>
+
+<ul class="documentation-list">
+  <li><a href="comparisons/introduction.html">Introduction</a></li>
+  <li><a href="comparisons/mupd8.html">MUPD8</a></li>
+  <li><a href="comparisons/storm.html">Storm</a></li>
+  <li><a href="comparisons/spark-streaming.html">Spark Streaming</a></li>
+<!-- TODO comparisons pages
+  <li><a href="comparisons/aurora.html">Aurora</a></li>
+  <li><a href="comparisons/jms.html">JMS</a></li>
+  <li><a href="comparisons/s4.html">S4</a></li>
+-->
+</ul>
+
+<h4>API</h4>
+
+<ul class="documentation-list">
+  <li><a href="api/overview.html">Overview</a></li>
+  <li><a href="jobs/configuration-table.html">Configuration</a></li>
+  <li><a href="api/javadocs">Javadocs</a></li>
+</ul>
+
+<h4>Container</h4>
+
+<ul class="documentation-list">
+  <li><a href="container/samza-container.html">SamzaContainer</a></li>
+  <li><a href="container/streams.html">Streams</a></li>
+  <li><a href="container/serialization.html">Serialization</a></li>
+  <li><a href="container/checkpointing.html">Checkpointing</a></li>
+  <li><a href="container/state-management.html">State Management</a></li>
+  <li><a href="container/metrics.html">Metrics</a></li>
+  <li><a href="container/windowing.html">Windowing</a></li>
+  <li><a href="container/event-loop.html">Event Loop</a></li>
+  <li><a href="container/jmx.html">JMX</a></li>
+</ul>
+
+<h4>Jobs</h4>
+
+<ul class="documentation-list">
+  <li><a href="jobs/job-runner.html">JobRunner</a></li>
+  <li><a href="jobs/configuration.html">Configuration</a></li>
+  <li><a href="jobs/packaging.html">Packaging</a></li>
+  <li><a href="jobs/yarn-jobs.html">YARN Jobs</a></li>
+  <li><a href="jobs/logging.html">Logging</a></li>
+  <li><a href="jobs/reprocessing.html">Reprocessing</a></li>
+</ul>
+
+<h4>YARN</h4>
+
+<ul class="documentation-list">
+  <li><a href="yarn/application-master.html">Application Master</a></li>
+  <li><a href="yarn/isolation.html">Isolation</a></li>
+<!-- TODO write yarn pages
+  <li><a href="">Fault Tolerance</a></li>
+  <li><a href="">Security</a></li>
+-->
+</ul>
+
+<h4>Operations</h4>
+
+<ul class="documentation-list">
+  <li><a href="operations/security.html">Security</a></li>
+  <li><a href="operations/kafka.html">Kafka</a></li>
+</div>
+
+
+          </div>
+        </div>
+
+      </div><!-- /.wrapper-content -->
+    </div><!-- /.wrapper -->
+
+    <div class="footer">
+      <div class="container">
+        <!-- nothing for now. -->
+      </div>
+    </div>
+
+  
+    <script>
+      $( document ).ready(function() {
+        if ( $.fn.urlExists( "/learn/documentation/latest/index.html" ) ) {
+          $("#switch-version-button").addClass("fa fa-history masthead-icon");
+        }
+      });
+
+      /* a function to test whether the url exists or not */
+      (function( $ ) {
+        $.fn.urlExists = function(url) {
+          var http = new XMLHttpRequest();
+          http.open('HEAD', url, false);
+          http.send();
+          return http.status != 404;
+        };
+      }( jQuery ));
+    </script>
+  
+
+    <!-- Google Analytics -->
+    <script>
+      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-43122768-1', 'apache.org');
+      ga('send', 'pageview');
+
+    </script>
+  </body>
+</html>

Added: incubator/samza/site/learn/documentation/0.8/introduction/architecture.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.8/introduction/architecture.html?rev=1643389&view=auto
==============================================================================
--- incubator/samza/site/learn/documentation/0.8/introduction/architecture.html (added)
+++ incubator/samza/site/learn/documentation/0.8/introduction/architecture.html Fri Dec  5 18:52:32 2014
@@ -0,0 +1,286 @@
+<!DOCTYPE html>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Samza - Architecture</title>
+    <link href='/css/ropa-sans.css' rel='stylesheet' type='text/css'/>
+    <link href="/css/bootstrap.min.css" rel="stylesheet"/>
+    <link href="/css/font-awesome.min.css" rel="stylesheet"/>
+    <link href="/css/main.css" rel="stylesheet"/>
+    <link href="/css/syntax.css" rel="stylesheet"/>
+    <link rel="icon" type="image/png" href="/img/samza-icon.png">
+    <script src="/js/jquery-1.11.1.min.js"></script>
+  </head>
+  <body>
+    <div class="wrapper">
+      <div class="wrapper-content">
+
+        <div class="masthead">
+          <div class="container">
+            <div class="masthead-logo">
+              <a href="/" class="logo">samza</a>
+            </div>
+            <div class="masthead-icons">
+              <div class="pull-right">
+                <a href="/startup/download"><i class="fa fa-arrow-circle-o-down masthead-icon"></i></a>
+                <a href="https://git-wip-us.apache.org/repos/asf?p=incubator-samza.git;a=tree" target="_blank"><i class="fa fa-code masthead-icon" style="font-weight: bold;"></i></a>
+                <a href="https://twitter.com/samzastream" target="_blank"><i class="fa fa-twitter masthead-icon"></i></a>
+                <!-- this icon only shows in versioned pages -->
+                
+                  
+                    
+                  
+                  <a href="http://samza.incubator.apache.org/learn/documentation/latest/introduction/architecture.html"><i id="switch-version-button"></i></a>
+                   <!-- links for the navigation bar -->
+                
+
+              </div>
+            </div>
+          </div><!-- /.container -->
+        </div>
+
+        <div class="container">
+          <div class="menu">
+            <h1><i class="fa fa-rocket"></i> Getting Started</h1>
+            <ul>
+              <li><a href="/startup/hello-samza/0.8">Hello Samza</a></li>
+              <li><a href="/startup/download">Download</a></li>
+            </ul>
+
+            <h1><i class="fa fa-book"></i> Learn</h1>
+            <ul>
+              <li><a href="/learn/documentation/0.8">Documentation</a></li>
+              <li><a href="/learn/documentation/0.8/jobs/configuration-table.html">Configuration</a></li>
+              <li><a href="/learn/documentation/0.8/api/javadocs/">Javadocs</a></li>
+              <li><a href="/learn/tutorials/0.8">Tutorials</a></li>
+              <li><a href="http://wiki.apache.org/samza/FAQ">FAQ</a></li>
+              <li><a href="http://wiki.apache.org/samza">Wiki</a></li>
+              <li><a href="http://wiki.apache.org/samza/PapersAndTalks">Papers &amp; Talks</a></li>
+              <li><a href="http://blogs.apache.org/samza">Blog</a></li>
+            </ul>
+
+            <h1><i class="fa fa-comments"></i> Community</h1>
+            <ul>
+              <li><a href="/community/mailing-lists.html">Mailing Lists</a></li>
+              <li><a href="/community/irc.html">IRC</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/SAMZA">Bugs</a></li>
+              <li><a href="http://wiki.apache.org/samza/PoweredBy">Powered by</a></li>
+              <li><a href="http://wiki.apache.org/samza/Ecosystem">Ecosystem</a></li>
+              <li><a href="/community/committers.html">Committers</a></li>
+            </ul>
+
+            <h1><i class="fa fa-code"></i> Contribute</h1>
+            <ul>
+              <li><a href="/contribute/rules.html">Rules</a></li>
+              <li><a href="/contribute/coding-guide.html">Coding Guide</a></li>
+              <li><a href="/contribute/projects.html">Projects</a></li>
+              <li><a href="/contribute/design-documents.html">Design Documents</a></li>
+              <li><a href="/contribute/code.html">Code</a></li>
+              <li><a href="https://reviews.apache.org/groups/samza">Review Board</a></li>
+              <li><a href="/contribute/tests.html">Tests</a></li>
+              <li><a href="/contribute/disclaimer.html">Disclaimer</a></li>
+            </ul>
+
+            <h1><i class="fa fa-history"></i> Archive</h1>
+            <ul>
+              <li><a href="/archive/index.html#latest">latest</a></li>
+              <li><a href="/archive/index.html#08">0.8</a></li>
+              <li><a href="/archive/index.html#07">0.7</a></li>
+            </ul>
+          </div>
+
+          <div class="content">
+            <!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<h2>Architecture</h2>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<p>Samza is made up of three layers:</p>
+
+<ol>
+<li>A streaming layer.</li>
+<li>An execution layer.</li>
+<li>A processing layer.</li>
+</ol>
+
+<p>Samza provides out of the box support for all three layers.</p>
+
+<ol>
+<li><strong>Streaming:</strong> <a href="http://kafka.apache.org/">Kafka</a></li>
+<li><strong>Execution:</strong> <a href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html">YARN</a></li>
+<li><strong>Processing:</strong> <a href="../api/overview.html">Samza API</a></li>
+</ol>
+
+<p>These three pieces fit together to form Samza:</p>
+
+<p><img src="/img/0.8/learn/documentation/introduction/samza-ecosystem.png" alt="diagram-medium"></p>
+
+<p>This architecture follows a similar pattern to Hadoop (which also uses YARN as execution layer, HDFS for storage, and MapReduce as processing API):</p>
+
+<p><img src="/img/0.8/learn/documentation/introduction/samza-hadoop.png" alt="diagram-medium"></p>
+
+<p>Before going in-depth on each of these three layers, it should be noted that Samza&rsquo;s support is not limited to Kafka and YARN. Both Samza&rsquo;s execution and streaming layer are pluggable, and allow developers to implement alternatives if they prefer.</p>
+
+<h3 id="kafka">Kafka</h3>
+
+<p><a href="http://kafka.apache.org/">Kafka</a> is a distributed pub/sub and message queueing system that provides at-least once messaging guarantees (i.e. the system guarantees that no messages are lost, but in certain fault scenarios, a consumer might receive the same message more than once), and highly available partitions (i.e. a stream&rsquo;s partitions continue to be available even if a machine goes down).</p>
+
+<p>In Kafka, each stream is called a <em>topic</em>. Each topic is partitioned and replicated across multiple machines called <em>brokers</em>. When a <em>producer</em> sends a message to a topic, it provides a key, which is used to determine which partition the message should be sent to. The Kafka brokers receive and store the messages that the producer sends. Kafka <em>consumers</em> can then read from a topic by subscribing to messages on all partitions of a topic.</p>
+
+<p>Kafka has some interesting properties: </p>
+
+<ul>
+<li>All messages with the same key are guaranteed to be in the same topic partition. This means that if you wish to read all messages for a specific user ID, you only have to read the messages from the partition that contains the user ID, not the whole topic (assuming the user ID is used as key).</li>
+<li>A topic partition is a sequence of messages in order of arrival, so you can reference any message in the partition using a monotonically increasing <em>offset</em> (like an index into an array). This means that the broker doesn&rsquo;t need to keep track of which messages have been seen by a particular consumer &mdash; the consumer can keep track itself by storing the offset of the last message it has processed. It then knows that every message with a lower offset than the current offset has already been processed; every message with a higher offset has not yet been processed.</li>
+</ul>
+
+<p>For more details on Kafka, see Kafka&rsquo;s <a href="http://kafka.apache.org/documentation.html">documentation</a> pages.</p>
+
+<h3 id="yarn">YARN</h3>
+
+<p><a href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html">YARN</a> (Yet Another Resource Negotiator) is Hadoop&rsquo;s next-generation cluster scheduler. It allows you to allocate a number of <em>containers</em> (processes) in a cluster of machines, and execute arbitrary commands on them.</p>
+
+<p>When an application interacts with YARN, it looks something like this:</p>
+
+<ol>
+<li><strong>Application</strong>: I want to run command X on two machines with 512MB memory.</li>
+<li><strong>YARN</strong>: Cool, where&rsquo;s your code?</li>
+<li><strong>Application</strong>: http://path.to.host/jobs/download/my.tgz</li>
+<li><strong>YARN</strong>: I&rsquo;m running your job on node-1.grid and node-2.grid.</li>
+</ol>
+
+<p>Samza uses YARN to manage deployment, fault tolerance, logging, resource isolation, security, and locality. A brief overview of YARN is below; see <a href="http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/">this page from Hortonworks</a> for a much better overview.</p>
+
+<h4 id="yarn-architecture">YARN Architecture</h4>
+
+<p>YARN has three important pieces: a <em>ResourceManager</em>, a <em>NodeManager</em>, and an <em>ApplicationMaster</em>. In a YARN grid, every machine runs a NodeManager, which is responsible for launching processes on that machine. A ResourceManager talks to all of the NodeManagers to tell them what to run. Applications, in turn, talk to the ResourceManager when they wish to run something on the cluster. The third piece, the ApplicationMaster, is actually application-specific code that runs in the YARN cluster. It&rsquo;s responsible for managing the application&rsquo;s workload, asking for containers (usually UNIX processes), and handling notifications when one of its containers fails.</p>
+
+<h4 id="samza-and-yarn">Samza and YARN</h4>
+
+<p>Samza provides a YARN ApplicationMaster and a YARN job runner out of the box. The integration between Samza and YARN is outlined in the following diagram (different colors indicate different host machines):</p>
+
+<p><img src="/img/0.8/learn/documentation/introduction/samza-yarn-integration.png" alt="diagram-small"></p>
+
+<p>The Samza client talks to the YARN RM when it wants to start a new Samza job. The YARN RM talks to a YARN NM to allocate space on the cluster for Samza&rsquo;s ApplicationMaster. Once the NM allocates space, it starts the Samza AM. After the Samza AM starts, it asks the YARN RM for one or more YARN containers to run <a href="../container/samza-container.html">SamzaContainers</a>. Again, the RM works with NMs to allocate space for the containers. Once the space has been allocated, the NMs start the Samza containers.</p>
+
+<h3 id="samza">Samza</h3>
+
+<p>Samza uses YARN and Kafka to provide a framework for stage-wise stream processing and partitioning. Everything, put together, looks like this (different colors indicate different host machines):</p>
+
+<p><img src="/img/0.8/learn/documentation/introduction/samza-yarn-kafka-integration.png" alt="diagram-small"></p>
+
+<p>The Samza client uses YARN to run a Samza job: YARN starts and supervises one or more <a href="../container/samza-container.html">SamzaContainers</a>, and your processing code (using the <a href="../api/overview.html">StreamTask</a> API) runs inside those containers. The input and output for the Samza StreamTasks come from Kafka brokers that are (usually) co-located on the same machines as the YARN NMs.</p>
+
+<h3 id="example">Example</h3>
+
+<p>Let&rsquo;s take a look at a real example: suppose we want to count the number of page views. In SQL, you would write something like:</p>
+
+<div class="highlight"><pre><code class="sql"><span class="k">SELECT</span> <span class="n">user_id</span><span class="p">,</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">PageViewEvent</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="n">user_id</span></code></pre></div>
+
+<p>Although Samza doesn&rsquo;t support SQL right now, the idea is the same. Two jobs are required to calculate this query: one to group messages by user ID, and the other to do the counting.</p>
+
+<p>In the first job, the grouping is done by sending all messages with the same user ID to the same partition of an intermediate topic. You can do this by using the user ID as key of the messages that are emitted by the first job, and this key is mapped to one of the intermediate topic&rsquo;s partitions (usually by taking a hash of the key mod the number of partitions). The second job consumes the intermediate topic. Each task in the second job consumes one partition of the intermediate topic, i.e. all the messages for a subset of user IDs. The task has a counter for each user ID in its partition, and the appropriate counter is incremented every time the task receives a message with a particular user ID.</p>
+
+<p><img src="/img/0.8/learn/documentation/introduction/group-by-example.png" alt="Repartitioning for a GROUP BY" class="diagram-large"></p>
+
+<p>If you are familiar with Hadoop, you may recognize this as a Map/Reduce operation, where each record is associated with a particular key in the mappers, records with the same key are grouped together by the framework, and then counted in the reduce step. The difference between Hadoop and Samza is that Hadoop operates on a fixed input, whereas Samza works with unbounded streams of data.</p>
+
+<p>Kafka takes the messages emitted by the first job and buffers them on disk, distributed across multiple machines. This helps make the system fault-tolerant: if one machine fails, no messages are lost, because they have been replicated to other machines. And if the second job goes slow or stops consuming messages for any reason, the first job is unaffected: the disk buffer can absorb the backlog of messages from the first job until the second job catches up again.</p>
+
+<p>By partitioning topics, and by breaking a stream process down into jobs and parallel tasks that run on multiple machines, Samza scales to streams with very high message throughput. By using YARN and Kafka, Samza achieves fault-tolerance: if a process or machine fails, it is automatically restarted on another machine and continues processing messages from the point where it left off.</p>
+
+<h2 id="comparison-introduction-&raquo;"><a href="../comparisons/introduction.html">Comparison Introduction &raquo;</a></h2>
+
+
+          </div>
+        </div>
+
+      </div><!-- /.wrapper-content -->
+    </div><!-- /.wrapper -->
+
+    <div class="footer">
+      <div class="container">
+        <!-- nothing for now. -->
+      </div>
+    </div>
+
+  
+    <script>
+      $( document ).ready(function() {
+        if ( $.fn.urlExists( "/learn/documentation/latest/introduction/architecture.html" ) ) {
+          $("#switch-version-button").addClass("fa fa-history masthead-icon");
+        }
+      });
+
+      /* a function to test whether the url exists or not */
+      (function( $ ) {
+        $.fn.urlExists = function(url) {
+          var http = new XMLHttpRequest();
+          http.open('HEAD', url, false);
+          http.send();
+          return http.status != 404;
+        };
+      }( jQuery ));
+    </script>
+  
+
+    <!-- Google Analytics -->
+    <script>
+      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-43122768-1', 'apache.org');
+      ga('send', 'pageview');
+
+    </script>
+  </body>
+</html>

Added: incubator/samza/site/learn/documentation/0.8/introduction/background.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.8/introduction/background.html?rev=1643389&view=auto
==============================================================================
--- incubator/samza/site/learn/documentation/0.8/introduction/background.html (added)
+++ incubator/samza/site/learn/documentation/0.8/introduction/background.html Fri Dec  5 18:52:32 2014
@@ -0,0 +1,247 @@
+<!DOCTYPE html>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Samza - Background</title>
+    <link href='/css/ropa-sans.css' rel='stylesheet' type='text/css'/>
+    <link href="/css/bootstrap.min.css" rel="stylesheet"/>
+    <link href="/css/font-awesome.min.css" rel="stylesheet"/>
+    <link href="/css/main.css" rel="stylesheet"/>
+    <link href="/css/syntax.css" rel="stylesheet"/>
+    <link rel="icon" type="image/png" href="/img/samza-icon.png">
+    <script src="/js/jquery-1.11.1.min.js"></script>
+  </head>
+  <body>
+    <div class="wrapper">
+      <div class="wrapper-content">
+
+        <div class="masthead">
+          <div class="container">
+            <div class="masthead-logo">
+              <a href="/" class="logo">samza</a>
+            </div>
+            <div class="masthead-icons">
+              <div class="pull-right">
+                <a href="/startup/download"><i class="fa fa-arrow-circle-o-down masthead-icon"></i></a>
+                <a href="https://git-wip-us.apache.org/repos/asf?p=incubator-samza.git;a=tree" target="_blank"><i class="fa fa-code masthead-icon" style="font-weight: bold;"></i></a>
+                <a href="https://twitter.com/samzastream" target="_blank"><i class="fa fa-twitter masthead-icon"></i></a>
+                <!-- this icon only shows in versioned pages -->
+                
+                  
+                    
+                  
+                  <a href="http://samza.incubator.apache.org/learn/documentation/latest/introduction/background.html"><i id="switch-version-button"></i></a>
+                   <!-- links for the navigation bar -->
+                
+
+              </div>
+            </div>
+          </div><!-- /.container -->
+        </div>
+
+        <div class="container">
+          <div class="menu">
+            <h1><i class="fa fa-rocket"></i> Getting Started</h1>
+            <ul>
+              <li><a href="/startup/hello-samza/0.8">Hello Samza</a></li>
+              <li><a href="/startup/download">Download</a></li>
+            </ul>
+
+            <h1><i class="fa fa-book"></i> Learn</h1>
+            <ul>
+              <li><a href="/learn/documentation/0.8">Documentation</a></li>
+              <li><a href="/learn/documentation/0.8/jobs/configuration-table.html">Configuration</a></li>
+              <li><a href="/learn/documentation/0.8/api/javadocs/">Javadocs</a></li>
+              <li><a href="/learn/tutorials/0.8">Tutorials</a></li>
+              <li><a href="http://wiki.apache.org/samza/FAQ">FAQ</a></li>
+              <li><a href="http://wiki.apache.org/samza">Wiki</a></li>
+              <li><a href="http://wiki.apache.org/samza/PapersAndTalks">Papers &amp; Talks</a></li>
+              <li><a href="http://blogs.apache.org/samza">Blog</a></li>
+            </ul>
+
+            <h1><i class="fa fa-comments"></i> Community</h1>
+            <ul>
+              <li><a href="/community/mailing-lists.html">Mailing Lists</a></li>
+              <li><a href="/community/irc.html">IRC</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/SAMZA">Bugs</a></li>
+              <li><a href="http://wiki.apache.org/samza/PoweredBy">Powered by</a></li>
+              <li><a href="http://wiki.apache.org/samza/Ecosystem">Ecosystem</a></li>
+              <li><a href="/community/committers.html">Committers</a></li>
+            </ul>
+
+            <h1><i class="fa fa-code"></i> Contribute</h1>
+            <ul>
+              <li><a href="/contribute/rules.html">Rules</a></li>
+              <li><a href="/contribute/coding-guide.html">Coding Guide</a></li>
+              <li><a href="/contribute/projects.html">Projects</a></li>
+              <li><a href="/contribute/design-documents.html">Design Documents</a></li>
+              <li><a href="/contribute/code.html">Code</a></li>
+              <li><a href="https://reviews.apache.org/groups/samza">Review Board</a></li>
+              <li><a href="/contribute/tests.html">Tests</a></li>
+              <li><a href="/contribute/disclaimer.html">Disclaimer</a></li>
+            </ul>
+
+            <h1><i class="fa fa-history"></i> Archive</h1>
+            <ul>
+              <li><a href="/archive/index.html#latest">latest</a></li>
+              <li><a href="/archive/index.html#08">0.8</a></li>
+              <li><a href="/archive/index.html#07">0.7</a></li>
+            </ul>
+          </div>
+
+          <div class="content">
+            <!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<h2>Background</h2>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<p>This page provides some background about stream processing, describes what Samza is, and why it was built.</p>
+
+<h3 id="what-is-messaging?">What is messaging?</h3>
+
+<p>Messaging systems are a popular way of implementing near-realtime asynchronous computation. Messages can be added to a message queue (ActiveMQ, RabbitMQ), pub-sub system (Kestrel, Kafka), or log aggregation system (Flume, Scribe) when something happens. Downstream <em>consumers</em> read messages from these systems, and process them or take actions based on the message contents.</p>
+
+<p>Suppose you have a website, and every time someone loads a page, you send a &ldquo;user viewed page&rdquo; event to a messaging system. You might then have consumers which do any of the following:</p>
+
+<ul>
+<li>Store the message in Hadoop for future analysis</li>
+<li>Count page views and update a dashboard</li>
+<li>Trigger an alert if a page view fails</li>
+<li>Send an email notification to another user</li>
+<li>Join the page view event with the user&rsquo;s profile, and send the message back to the messaging system</li>
+</ul>
+
+<p>A messaging system lets you decouple all of this work from the actual web page serving.</p>
+
+<h3 id="what-is-stream-processing?">What is stream processing?</h3>
+
+<p>A messaging system is a fairly low-level piece of infrastructure&mdash;it stores messages and waits for consumers to consume them. When you start writing code that produces or consumes messages, you quickly find that there are a lot of tricky problems that have to be solved in the processing layer. Samza aims to help with these problems.</p>
+
+<p>Consider the counting example, above (count page views and update a dashboard). What happens when the machine that your consumer is running on fails, and your current counter values are lost? How do you recover? Where should the processor be run when it restarts? What if the underlying messaging system sends you the same message twice, or loses a message? (Unless you are careful, your counts will be incorrect.) What if you want to count page views grouped by the page URL? How do you distribute the computation across multiple machines if it&rsquo;s too much for a single machine to handle?</p>
+
+<p>Stream processing is a higher level of abstraction on top of messaging systems, and it&rsquo;s meant to address precisely this category of problems.</p>
+
+<h3 id="samza">Samza</h3>
+
+<p>Samza is a stream processing framework with the following features:</p>
+
+<ul>
+<li><strong>Simple API:</strong> Unlike most low-level messaging system APIs, Samza provides a very simple callback-based &ldquo;process message&rdquo; API comparable to MapReduce.</li>
+<li><strong>Managed state:</strong> Samza manages snapshotting and restoration of a stream processor&rsquo;s state. When the processor is restarted, Samza restores its state to a consistent snapshot. Samza is built to handle large amounts of state (many gigabytes per partition).</li>
+<li><strong>Fault tolerance:</strong> Whenever a machine in the cluster fails, Samza works with YARN to transparently migrate your tasks to another machine.</li>
+<li><strong>Durability:</strong> Samza uses Kafka to guarantee that messages are processed in the order they were written to a partition, and that no messages are ever lost.</li>
+<li><strong>Scalability:</strong> Samza is partitioned and distributed at every level. Kafka provides ordered, partitioned, replayable, fault-tolerant streams. YARN provides a distributed environment for Samza containers to run in.</li>
+<li><strong>Pluggable:</strong> Though Samza works out of the box with Kafka and YARN, Samza provides a pluggable API that lets you run Samza with other messaging systems and execution environments.</li>
+<li><strong>Processor isolation:</strong> Samza works with Apache YARN, which supports Hadoop&rsquo;s security model, and resource isolation through Linux CGroups.</li>
+</ul>
+
+<h3 id="alternatives">Alternatives</h3>
+
+<p>The available open source stream processing systems are actually quite young, and no single system offers a complete solution. New problems in this area include: how a stream processor&rsquo;s state should be managed, whether or not a stream should be buffered remotely on disk, what to do when duplicate messages are received or messages are lost, and how to model underlying messaging systems.</p>
+
+<p>Samza&rsquo;s main differentiators are:</p>
+
+<ul>
+<li>Samza supports fault-tolerant local state. State can be thought of as tables that are split up and co-located with the processing tasks. State is itself modeled as a stream. If the local state is lost due to machine failure, the state stream is replayed to restore it.</li>
+<li>Streams are ordered, partitioned, replayable, and fault tolerant.</li>
+<li>YARN is used for processor isolation, security, and fault tolerance.</li>
+<li>Jobs are decoupled: if one job goes slow and builds up a backlog of unprocessed messages, the rest of the system is not affected.</li>
+</ul>
+
+<p>For a more in-depth discussion on Samza, and how it relates to other stream processing systems, have a look at Samza&rsquo;s <a href="../comparisons/introduction.html">Comparisons</a> documentation.</p>
+
+<h2 id="concepts-&raquo;"><a href="concepts.html">Concepts &raquo;</a></h2>
+
+
+          </div>
+        </div>
+
+      </div><!-- /.wrapper-content -->
+    </div><!-- /.wrapper -->
+
+    <div class="footer">
+      <div class="container">
+        <!-- nothing for now. -->
+      </div>
+    </div>
+
+  
+    <script>
+      $( document ).ready(function() {
+        if ( $.fn.urlExists( "/learn/documentation/latest/introduction/background.html" ) ) {
+          $("#switch-version-button").addClass("fa fa-history masthead-icon");
+        }
+      });
+
+      /* a function to test whether the url exists or not */
+      (function( $ ) {
+        $.fn.urlExists = function(url) {
+          var http = new XMLHttpRequest();
+          http.open('HEAD', url, false);
+          http.send();
+          return http.status != 404;
+        };
+      }( jQuery ));
+    </script>
+  
+
+    <!-- Google Analytics -->
+    <script>
+      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-43122768-1', 'apache.org');
+      ga('send', 'pageview');
+
+    </script>
+  </body>
+</html>

Added: incubator/samza/site/learn/documentation/0.8/introduction/concepts.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.8/introduction/concepts.html?rev=1643389&view=auto
==============================================================================
--- incubator/samza/site/learn/documentation/0.8/introduction/concepts.html (added)
+++ incubator/samza/site/learn/documentation/0.8/introduction/concepts.html Fri Dec  5 18:52:32 2014
@@ -0,0 +1,242 @@
+<!DOCTYPE html>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Samza - Concepts</title>
+    <link href='/css/ropa-sans.css' rel='stylesheet' type='text/css'/>
+    <link href="/css/bootstrap.min.css" rel="stylesheet"/>
+    <link href="/css/font-awesome.min.css" rel="stylesheet"/>
+    <link href="/css/main.css" rel="stylesheet"/>
+    <link href="/css/syntax.css" rel="stylesheet"/>
+    <link rel="icon" type="image/png" href="/img/samza-icon.png">
+    <script src="/js/jquery-1.11.1.min.js"></script>
+  </head>
+  <body>
+    <div class="wrapper">
+      <div class="wrapper-content">
+
+        <div class="masthead">
+          <div class="container">
+            <div class="masthead-logo">
+              <a href="/" class="logo">samza</a>
+            </div>
+            <div class="masthead-icons">
+              <div class="pull-right">
+                <a href="/startup/download"><i class="fa fa-arrow-circle-o-down masthead-icon"></i></a>
+                <a href="https://git-wip-us.apache.org/repos/asf?p=incubator-samza.git;a=tree" target="_blank"><i class="fa fa-code masthead-icon" style="font-weight: bold;"></i></a>
+                <a href="https://twitter.com/samzastream" target="_blank"><i class="fa fa-twitter masthead-icon"></i></a>
+                <!-- this icon only shows in versioned pages -->
+                
+                  
+                    
+                  
+                  <a href="http://samza.incubator.apache.org/learn/documentation/latest/introduction/concepts.html"><i id="switch-version-button"></i></a>
+                   <!-- links for the navigation bar -->
+                
+
+              </div>
+            </div>
+          </div><!-- /.container -->
+        </div>
+
+        <div class="container">
+          <div class="menu">
+            <h1><i class="fa fa-rocket"></i> Getting Started</h1>
+            <ul>
+              <li><a href="/startup/hello-samza/0.8">Hello Samza</a></li>
+              <li><a href="/startup/download">Download</a></li>
+            </ul>
+
+            <h1><i class="fa fa-book"></i> Learn</h1>
+            <ul>
+              <li><a href="/learn/documentation/0.8">Documentation</a></li>
+              <li><a href="/learn/documentation/0.8/jobs/configuration-table.html">Configuration</a></li>
+              <li><a href="/learn/documentation/0.8/api/javadocs/">Javadocs</a></li>
+              <li><a href="/learn/tutorials/0.8">Tutorials</a></li>
+              <li><a href="http://wiki.apache.org/samza/FAQ">FAQ</a></li>
+              <li><a href="http://wiki.apache.org/samza">Wiki</a></li>
+              <li><a href="http://wiki.apache.org/samza/PapersAndTalks">Papers &amp; Talks</a></li>
+              <li><a href="http://blogs.apache.org/samza">Blog</a></li>
+            </ul>
+
+            <h1><i class="fa fa-comments"></i> Community</h1>
+            <ul>
+              <li><a href="/community/mailing-lists.html">Mailing Lists</a></li>
+              <li><a href="/community/irc.html">IRC</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/SAMZA">Bugs</a></li>
+              <li><a href="http://wiki.apache.org/samza/PoweredBy">Powered by</a></li>
+              <li><a href="http://wiki.apache.org/samza/Ecosystem">Ecosystem</a></li>
+              <li><a href="/community/committers.html">Committers</a></li>
+            </ul>
+
+            <h1><i class="fa fa-code"></i> Contribute</h1>
+            <ul>
+              <li><a href="/contribute/rules.html">Rules</a></li>
+              <li><a href="/contribute/coding-guide.html">Coding Guide</a></li>
+              <li><a href="/contribute/projects.html">Projects</a></li>
+              <li><a href="/contribute/design-documents.html">Design Documents</a></li>
+              <li><a href="/contribute/code.html">Code</a></li>
+              <li><a href="https://reviews.apache.org/groups/samza">Review Board</a></li>
+              <li><a href="/contribute/tests.html">Tests</a></li>
+              <li><a href="/contribute/disclaimer.html">Disclaimer</a></li>
+            </ul>
+
+            <h1><i class="fa fa-history"></i> Archive</h1>
+            <ul>
+              <li><a href="/archive/index.html#latest">latest</a></li>
+              <li><a href="/archive/index.html#08">0.8</a></li>
+              <li><a href="/archive/index.html#07">0.7</a></li>
+            </ul>
+          </div>
+
+          <div class="content">
+            <!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<h2>Concepts</h2>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<p>This page gives an introduction to the high-level concepts in Samza.</p>
+
+<h3 id="streams">Streams</h3>
+
+<p>Samza processes <em>streams</em>. A stream is composed of immutable <em>messages</em> of a similar type or category. For example, a stream could be all the clicks on a website, or all the updates to a particular database table, or all the logs produced by a service, or any other type of event data. Messages can be appended to a stream or read from a stream. A stream can have any number of <em>consumers</em>, and reading from a stream doesn&rsquo;t delete the message (so each message is effectively broadcast to all consumers). Messages can optionally have an associated key which is used for partitioning, which we&rsquo;ll talk about in a second.</p>
+
+<p>Samza supports pluggable <em>systems</em> that implement the stream abstraction: in <a href="https://kafka.apache.org/">Kafka</a> a stream is a topic, in a database we might read a stream by consuming updates from a table, in Hadoop we might tail a directory of files in HDFS.</p>
+
+<p><img src="/img/0.8/learn/documentation/introduction/job.png" alt="job"></p>
+
+<h3 id="jobs">Jobs</h3>
+
+<p>A Samza <em>job</em> is code that performs a logical transformation on a set of input streams to append output messages to set of output streams.</p>
+
+<p>If scalability were not a concern, streams and jobs would be all we need. However, in order to scale the throughput of the stream processor, we chop streams and jobs up into smaller units of parallelism: <em>partitions</em> and <em>tasks</em>.</p>
+
+<h3 id="partitions">Partitions</h3>
+
+<p>Each stream is broken into one or more partitions. Each partition in the stream is a totally ordered sequence of messages.</p>
+
+<p>Each message in this sequence has an identifier called the <em>offset</em>, which is unique per partition. The offset can be a sequential integer, byte offset, or string depending on the underlying system implementation.</p>
+
+<p>When a message is appended to a stream, it is appended to only one of the stream&rsquo;s partitions. The assignment of the message to its partition is done with a key chosen by the writer. For example, if the user ID is used as the key, that ensures that all messages related to a particular user end up in the same partition.</p>
+
+<p><img src="/img/0.8/learn/documentation/introduction/stream.png" alt="stream"></p>
+
+<h3 id="tasks">Tasks</h3>
+
+<p>A job is scaled by breaking it into multiple <em>tasks</em>. The <em>task</em> is the unit of parallelism of the job, just as the partition is to the stream. Each task consumes data from one partition for each of the job&rsquo;s input streams.</p>
+
+<p>A task processes messages from each of its input partitions sequentially, in the order of message offset. There is no defined ordering across partitions. This allows each task to operate independently. The YARN scheduler assigns each task to a machine, so the job as a whole can be distributed across many machines.</p>
+
+<p>The number of tasks in a job is determined by the number of input partitions (there cannot be more tasks than input partitions, or there would be some tasks with no input). However, you can change the computational resources assigned to the job (the amount of memory, number of CPU cores, etc.) to satisfy the job&rsquo;s needs. See notes on <em>containers</em> below.</p>
+
+<p>The assignment of partitions to tasks never changes: if a task is on a machine that fails, the task is restarted elsewhere, still consuming the same stream partitions.</p>
+
+<p><img src="/img/0.8/learn/documentation/introduction/job_detail.png" alt="job-detail"></p>
+
+<h3 id="dataflow-graphs">Dataflow Graphs</h3>
+
+<p>We can compose multiple jobs to create a dataflow graph, where the nodes are streams containing data, and the edges are jobs performing transformations. This composition is done purely through the streams the jobs take as input and output. The jobs are otherwise totally decoupled: they need not be implemented in the same code base, and adding, removing, or restarting a downstream job will not impact an upstream job.</p>
+
+<p>These graphs are often acyclic&mdash;that is, data usually doesn&rsquo;t flow from a job, through other jobs, back to itself. However, it is possible to create cyclic graphs if you need to.</p>
+
+<p><img src="/img/0.8/learn/documentation/introduction/dag.png" width="430" alt="Directed acyclic job graph"></p>
+
+<h3 id="containers">Containers</h3>
+
+<p>Partitions and tasks are both <em>logical</em> units of parallelism&mdash;they don&rsquo;t correspond to any particular assignment of computational resources (CPU, memory, disk space, etc). Containers are the unit of physical parallelism, and a container is essentially a Unix process (or Linux <a href="http://en.wikipedia.org/wiki/Cgroups">cgroup</a>). Each container runs one or more tasks. The number of tasks is determined automatically from the number of partitions in the input and is fixed, but the number of containers (and the CPU and memory resources associated with them) is specified by the user at run time and can be changed at any time.</p>
+
+<h2 id="architecture-&raquo;"><a href="architecture.html">Architecture &raquo;</a></h2>
+
+
+          </div>
+        </div>
+
+      </div><!-- /.wrapper-content -->
+    </div><!-- /.wrapper -->
+
+    <div class="footer">
+      <div class="container">
+        <!-- nothing for now. -->
+      </div>
+    </div>
+
+  
+    <script>
+      $( document ).ready(function() {
+        if ( $.fn.urlExists( "/learn/documentation/latest/introduction/concepts.html" ) ) {
+          $("#switch-version-button").addClass("fa fa-history masthead-icon");
+        }
+      });
+
+      /* a function to test whether the url exists or not */
+      (function( $ ) {
+        $.fn.urlExists = function(url) {
+          var http = new XMLHttpRequest();
+          http.open('HEAD', url, false);
+          http.send();
+          return http.status != 404;
+        };
+      }( jQuery ));
+    </script>
+  
+
+    <!-- Google Analytics -->
+    <script>
+      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-43122768-1', 'apache.org');
+      ga('send', 'pageview');
+
+    </script>
+  </body>
+</html>