You are viewing a plain text version of this content. The canonical link for it is here.
Posted to s4-commits@incubator.apache.org by mm...@apache.org on 2013/02/23 19:45:19 UTC

svn commit: r1449398 - in /incubator/s4/site: doc/ doc/0.6.0/ doc/0.6.0/S4_hierarchical_archi/ doc/0.6.0/application_dependencies/ doc/0.6.0/configuration/ doc/0.6.0/dev_tips/ doc/0.6.0/event_dispatch/ doc/0.6.0/fault_tolerance/ doc/0.6.0/overview/ doc...

Author: mmorel
Date: Sat Feb 23 19:45:18 2013
New Revision: 1449398

URL: http://svn.apache.org/r1449398
Log:
draft doc for S4 0.6.0

Added:
    incubator/s4/site/doc/
    incubator/s4/site/doc/0.6.0/
    incubator/s4/site/doc/0.6.0/S4_hierarchical_archi/
    incubator/s4/site/doc/0.6.0/S4_hierarchical_archi/index.html   (with props)
    incubator/s4/site/doc/0.6.0/application_dependencies/
    incubator/s4/site/doc/0.6.0/application_dependencies/index.html
    incubator/s4/site/doc/0.6.0/configuration/
    incubator/s4/site/doc/0.6.0/configuration/index.html
    incubator/s4/site/doc/0.6.0/dev_tips/
    incubator/s4/site/doc/0.6.0/dev_tips/index.html
    incubator/s4/site/doc/0.6.0/event_dispatch/
    incubator/s4/site/doc/0.6.0/event_dispatch/index.html
    incubator/s4/site/doc/0.6.0/fault_tolerance/
    incubator/s4/site/doc/0.6.0/fault_tolerance/index.html
    incubator/s4/site/doc/0.6.0/index.html
    incubator/s4/site/doc/0.6.0/overview/
    incubator/s4/site/doc/0.6.0/overview/index.html
    incubator/s4/site/doc/0.6.0/walkthrough/
    incubator/s4/site/doc/0.6.0/walkthrough/index.html
    incubator/s4/site/images/doc/
    incubator/s4/site/images/doc/0.6.0/
    incubator/s4/site/images/doc/0.6.0/S4_hierarchical_archi.png   (with props)
    incubator/s4/site/images/doc/0.6.0/checkpointing-framework.png   (with props)
    incubator/s4/site/images/doc/0.6.0/failover.png   (with props)
    incubator/s4/site/images/doc/0.6.0/s4_node_layers.png   (with props)
    incubator/s4/site/images/doc/0.6.0/sampleAppDeployment.png   (with props)
    incubator/s4/site/images/doc/0.6.0/sources/
    incubator/s4/site/images/doc/0.6.0/sources/s4_node_layers.odg   (with props)

Added: incubator/s4/site/doc/0.6.0/S4_hierarchical_archi/index.html
URL: http://svn.apache.org/viewvc/incubator/s4/site/doc/0.6.0/S4_hierarchical_archi/index.html?rev=1449398&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/s4/site/doc/0.6.0/S4_hierarchical_archi/index.html
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/s4/site/doc/0.6.0/application_dependencies/index.html
URL: http://svn.apache.org/viewvc/incubator/s4/site/doc/0.6.0/application_dependencies/index.html?rev=1449398&view=auto
==============================================================================
--- incubator/s4/site/doc/0.6.0/application_dependencies/index.html (added)
+++ incubator/s4/site/doc/0.6.0/application_dependencies/index.html Sat Feb 23 19:45:18 2013
@@ -0,0 +1,123 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html>
+  <head>
+    <title>S4: Adding application dependencies</title>
+    <meta content='text/html; charset=utf-8' http-equiv='Content-Type' />
+    <meta content='A general-purpose distributed stream computing platform' name='description' />
+    <link href='/style/screen.css' media='screen' rel='stylesheet' type='text/css' />
+    <link href='/style/print.css' media='print' rel='stylesheet' type='text/css' />
+    <!--[if lt IE 9]>
+      <link href='/style/ie.css' media='screen' rel='stylesheet' type='text/css' />
+    <![endif]-->
+    <link href='/style/style.css' rel='stylesheet' type='text/css' />
+    <link href='/style/nav.css' rel='stylesheet' type='text/css' />
+    <script type='text/javascript'>
+        var _gaq = _gaq || [];
+        _gaq.push(['_setAccount', 'UA-19490961-1']);
+        _gaq.push(['_setDomainName', '.s4.io']);
+        _gaq.push(['_trackPageview']);
+        (function() {
+          var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+          ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+          var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+        })();
+      </script>
+  </head>
+  <body>
+    <div id='header'>
+      <div class='container'>
+        <div id='logo'>
+          <a href='/'>
+            <img src='/images/s4_test.png' />
+          </a>
+        </div>
+        <div id='navbar'><ul id='nav'>
+  <li>
+    <a href='/'>home</a>
+  </li>
+  <li>
+    <a href='https://cwiki.apache.org/confluence/display/S4/S4+Wiki'>doc [0.5]</a>
+  </li>
+  <li>
+    <a href='http://git-wip-us.apache.org/repos/asf?p=incubator-s4.git' onClick="_gaq.push(['_trackEvent', 'External', 'Apache Git', 'http://github.com/s4']);">code</a>
+  </li>
+  <li>
+    <a href='http://people.apache.org/~mmorel/apache-s4-0.5.0-incubating-doc/javadoc/'>API</a>
+  </li>
+  <li>
+    <a href='/contrib'>get involved</a>
+  </li>
+  <li>
+    <a href='/team'>team</a>
+  </li>
+  <li>
+    <a href='/download'>download</a>
+  </li>
+</ul></div>
+      </div>
+    </div>
+    <div id='wrapper'>
+      <div class='container' id='container'><blockquote>
+  <p>Make sure you have already read the <a href="../walkthrough">walkthrough</a></p>
+</blockquote>
+
+<h1 id="how-to-add-dependencies-to-my-s4-application">How to add dependencies to my S4 application?</h1>
+
+<p>Your application typically depends on various external libraries. Here is how to configure those dependencies in an S4 project. We assume here that you are working with a sample project automatically generated through the <code>s4 newApp</code> script.</p>
+
+<h2 id="dependencies-on-public-artifacts">Dependencies on public artifacts</h2>
+
+<ul>
+  <li>
+    <p>Add maven artifacts definitions to the gradle build file. For instance, add twitter4j_core and twitter4j_stream:</p>
+
+    <pre><code>  project.ext["libraries"] = [
+             twitter4j_core:     'org.twitter4j:twitter4j-core:2.2.5',
+             twitter4j_stream:   'org.twitter4j:twitter4j-stream:2.2.5',
+             s4_base:            'org.apache.s4:s4-base:0.5.0',
+             s4_comm:            'org.apache.s4:s4-comm:0.5.0',
+             s4_core:            'org.apache.s4:s4-core:0.5.0'
+         ]
+</code></pre>
+  </li>
+  <li>
+    <p>Add these dependencies as compile-time dependencies. For instance:</p>
+
+    <pre><code>  dependencies {
+     compile (libraries.s4_base)
+     compile (libraries.s4_comm)
+     compile (libraries.s4_core)
+     compile (libraries.twitter4j_core)
+     compile (libraries.twitter4j_stream)
+  }
+</code></pre>
+  </li>
+  <li>
+    <p>If you use an IDE such as eclipse, you may update your project&#8217;s classpath with: <code>./gradlew eclipse</code></p>
+  </li>
+</ul>
+
+<p>A good source for finding dependencies is for instance <a href="http://search.maven.org/">http://search.maven.org/</a> where you also get the syntax for gradle scripts (see grails syntax).</p>
+
+<blockquote>
+  <p>The application dependencies will be automatically included in the s4r archive that you create and publish.</p>
+</blockquote>
+
+<h2 id="dependencies-on-non-public-artifacts">Dependencies on non-public artifacts</h2>
+
+<p>You may have dependencies that are not published to maven repositories. In that case you should either:</p>
+
+<ul>
+  <li>publish them to your local maven repository, see <a href="[http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html">http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html</a></li>
+  <li>add them to the <em>lib</em> directory</li>
+</ul>
+
+<p>In both cases you still have to declare them as compile-time dependencies.</p></div>
+    </div>
+    <div id='footer'>
+      <div class='container'>
+        <span class='copyright'>Apache S4 - Copyright 2013 The Apache Software Foundation</span>
+      </div>
+    </div>
+  </body>
+</html>

Added: incubator/s4/site/doc/0.6.0/configuration/index.html
URL: http://svn.apache.org/viewvc/incubator/s4/site/doc/0.6.0/configuration/index.html?rev=1449398&view=auto
==============================================================================
--- incubator/s4/site/doc/0.6.0/configuration/index.html (added)
+++ incubator/s4/site/doc/0.6.0/configuration/index.html Sat Feb 23 19:45:18 2013
@@ -0,0 +1,240 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html>
+  <head>
+    <title>S4: Configuration</title>
+    <meta content='text/html; charset=utf-8' http-equiv='Content-Type' />
+    <meta content='A general-purpose distributed stream computing platform' name='description' />
+    <link href='/style/screen.css' media='screen' rel='stylesheet' type='text/css' />
+    <link href='/style/print.css' media='print' rel='stylesheet' type='text/css' />
+    <!--[if lt IE 9]>
+      <link href='/style/ie.css' media='screen' rel='stylesheet' type='text/css' />
+    <![endif]-->
+    <link href='/style/style.css' rel='stylesheet' type='text/css' />
+    <link href='/style/nav.css' rel='stylesheet' type='text/css' />
+    <script type='text/javascript'>
+        var _gaq = _gaq || [];
+        _gaq.push(['_setAccount', 'UA-19490961-1']);
+        _gaq.push(['_setDomainName', '.s4.io']);
+        _gaq.push(['_trackPageview']);
+        (function() {
+          var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+          ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+          var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+        })();
+      </script>
+  </head>
+  <body>
+    <div id='header'>
+      <div class='container'>
+        <div id='logo'>
+          <a href='/'>
+            <img src='/images/s4_test.png' />
+          </a>
+        </div>
+        <div id='navbar'><ul id='nav'>
+  <li>
+    <a href='/'>home</a>
+  </li>
+  <li>
+    <a href='https://cwiki.apache.org/confluence/display/S4/S4+Wiki'>doc [0.5]</a>
+  </li>
+  <li>
+    <a href='http://git-wip-us.apache.org/repos/asf?p=incubator-s4.git' onClick="_gaq.push(['_trackEvent', 'External', 'Apache Git', 'http://github.com/s4']);">code</a>
+  </li>
+  <li>
+    <a href='http://people.apache.org/~mmorel/apache-s4-0.5.0-incubating-doc/javadoc/'>API</a>
+  </li>
+  <li>
+    <a href='/contrib'>get involved</a>
+  </li>
+  <li>
+    <a href='/team'>team</a>
+  </li>
+  <li>
+    <a href='/download'>download</a>
+  </li>
+</ul></div>
+      </div>
+    </div>
+    <div id='wrapper'>
+      <div class='container' id='container'><h1 id="toolset">Toolset</h1>
+
+<p>S4 provides a set of tools to:</p>
+
+<ul>
+  <li>define S4 clusters: <code>s4 newCluster</code></li>
+  <li>start S4 nodes: <code>s4 node</code></li>
+  <li>package applications: <code>s4 s4r</code></li>
+  <li>deploy applications: <code>s4 deploy</code></li>
+  <li>start a Zookeeper server for easy testing: <code>s4 zkServer</code>
+    <ul>
+      <li><code>s4 zkServer -t</code> will start a Zookeeper server and automatically configure 2 clusters</li>
+    </ul>
+  </li>
+  <li>
+    <p>view the status of S4 clusters coordinated by a given Zookeeper ensemble: <code>s4 status</code></p>
+
+    <pre><code>  ./s4
+</code></pre>
+  </li>
+</ul>
+
+<p>will  give you a list of available commands.</p>
+
+<pre><code>./s4 &lt;command&gt; -help
+</code></pre>
+
+<p>will provide detailed documentation for each of these commands.</p>
+
+<h1 id="cluster-configuration">Cluster configuration</h1>
+
+<p>Before starting S4 nodes, you must define a logical cluster by specifying:</p>
+
+<ul>
+  <li>a name for the cluster</li>
+  <li>a number of partitions (~ tasks)</li>
+  <li>an initial port number for listener sockets
+    <ul>
+      <li>you must specify a free port, considering that each of the nodes of the cluster will open a different port, with a number monotonically increasing from the initial number. For instance, for a cluster of 10 nodes and an initial port 12000, ports 12000 to 12009 will be used among the nodes.</li>
+      <li>those ports are used for inter node communication.</li>
+    </ul>
+  </li>
+</ul>
+
+<p>The cluster configuration is maintained in Zookeeper, and can be set using S4 tools:</p>
+
+<pre><code>./s4 newCluster -c=cluster1 -nbTasks=2 -flp=12000 See tool documentation by typing:
+
+./s4 newCluster -help
+</code></pre>
+
+<h1 id="node-configuration">Node configuration</h1>
+
+<p><em>Platform</em> <em>code and</em> <em>application</em> <em>code are fully configurable,</em> <em>at deployment time{</em>}<em>.</em></p>
+
+<p>S4 nodes start as simple <em>bootstrap</em> processes whose initial role is merely to connect the cluster manager:</p>
+
+<ul>
+  <li>the bootstrap code connects to the cluster manager</li>
+  <li>when an application is available on the cluster, the node gets notified</li>
+  <li>it downloads the platform configuration and code, as specified in the configuration of the deployed application.</li>
+  <li>the communication and core components are loaded, bound and initialized</li>
+  <li>the application configuration and code, as specified in the configuration of the deployed applciation, is downloaded</li>
+  <li>the application is initialized and started</li>
+</ul>
+
+<p>This figure illustrates the separation between the bootstrap code, the S4 platform code, and application code in an S4 node:</p>
+
+<p><img src="/images/doc/0.6.0/s4_node_layers.png" alt="image" /></p>
+
+<p>Therefore, for starting an S4 node on a given host, you only need to specify:</p>
+
+<ul>
+  <li>the connection string to the cluster management system (Zookeeper) <code>localhost:2181</code> by default</li>
+  <li>the name of the logical cluster to which this node will belong</li>
+</ul>
+
+<p>Example:
+<code>./s4 node -c=cluster1 -zk=host.domain.com</code></p>
+
+<h1 id="application-configuration">Application configuration</h1>
+
+<p>Deploying applications is easier when we can define both the parameters of the application <em>and</em> the target environment.</p>
+
+<p>In S4, we achieve this by specifying <em>both</em> application parameters and S4 platform parameters in the deployment phase :</p>
+
+<ul>
+  <li>which application class to use</li>
+  <li>where to fetch application code</li>
+  <li>which specific modules to use</li>
+  <li>where to fetch these modules</li>
+  <li>string configuration parameters - that can be used by the application and the modules</li>
+</ul>
+
+<h2 id="modules-configuration">Modules configuration</h2>
+
+<p>S4 follows a modular design and uses<a href="http://code.google.com/p/google-guice/">Guice</a> for defining modules and injecting dependencies.</p>
+
+<p>As illustrated above, an S4 node is composed of:
+* a base module that specifies how to connect to the cluster manager and how to download code
+* a communication module that specifies communication protocols, event listeners and senders
+* a core module that specifies the deployment mechanism, serialization mechanism
+* an application</p>
+
+<h3 id="default-parameters">default parameters</h3>
+
+<p>For the <a href="https://github.com/apache/incubator-s4/blob/dev/subprojects/s4-comm/src/main/resources/default.s4.comm.properties">comm module</a>: communication protocols, tuning parameters for sending events</p>
+
+<p>For the core module, there is no default parameters.</p>
+
+<h3 id="overriding-modules">overriding modules</h3>
+
+<p>We provide default modules, but you may directly specify others through the command line, and it is also possible to override them with new modules and even specify new ones (custom modules classes must provide an empty no-args constructor).</p>
+
+<p>Custom overriding modules can be specified when deploying the application, through the<code>deploy</code> command, through the <em>emc</em> or <em>modulesClasses</em> option.</p>
+
+<p>For instance, in order to enable file system based checkpointing, pass the corresponding checkpointing module class :</p>
+
+<pre><code>./s4 deploy -s4r=uri/to/app.s4r -c=cluster1 -appName=myApp \
+-emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule 
+</code></pre>
+
+<p>You can also write your own custom modules. In that case, just package them into a jar file, and specify how to fetch that file when deploying the application, with the <em>mu</em> or <em>modulesURIs</em>  option.</p>
+
+<p>For instance, if you checkpoint through a specific key value store, you can write you own checkpointing implementation and module, package that into fancyKeyValueStoreCheckpointingModule.jar , and then:</p>
+
+<pre><code>./s4 node -c=cluster1 -emc=my.project.FancyKeyValueStoreBackendCheckpointingModule \
+-mu=uri/to/fancyKeyValueStoreCheckpointingModule.jar
+</code></pre>
+
+<h3 id="overriding-parameters">overriding parameters</h3>
+
+<p>A simple way to pass parameters to your application code is by:</p>
+
+<ul>
+  <li>
+    <p>injecting them in the application class:</p>
+
+    <pre><code>  @Inject
+  @Named('myParam')
+  param
+</code></pre>
+  </li>
+  <li>
+    <p>specifying the parameter value at node startup (using -p inline with the node command, or with the &#8216;@&#8217; syntax)</p>
+  </li>
+</ul>
+
+<p>S4 uses an internal Guice module that automatically injects configuration parameters passed through the deploy command to matching <code>@Named</code> parameters.</p>
+
+<p>Both application and platform parameters can be overriden. For instance, specifying a custom storage path for the file system based checkpointing mechanism would be passing the <code>s4.checkpointing.filesystem.storageRootPath</code> parameter:</p>
+
+<pre><code>./s4 deploy -s4r=uri/to/app.s4r -c=cluster1 -appName=myApp \
+-emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule \ 
+-p=s4.checkpointing.filesystem.storageRootPath=/custom/path 
+</code></pre>
+
+<h2 id="file-based-configuration">File-based configuration</h2>
+
+<p>Instead of specifying node parameters inline, you may refer to a file with the &#8216;@&#8217; notation:
+./s4 deploy @/path/to/config/file
+With contents of the referenced file like:</p>
+
+<pre><code>-s4r=uri/to/app.s4r
+-c=cluster1
+-appName=myApp
+-emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule
+-p=param1=value1,param2=value2
+</code></pre>
+
+<h2 id="logging">Logging</h2>
+
+<p>S4 uses <a href="http://logback.qos.ch/">logback</a>, and <a href="https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=blob_plain;f=subprojects/s4-core/src/main/resources/logback.xml;h=ea8c85a104b475f1b9dea641656e76eb3b6a9d4c;hb=piper">here</a> is the default configuration file. You may tweak this configuration by adding your own logback.xml file in the <code>lib/</code> directory (for a binary release) or in the <code>subprojects/s4-tools/build/install/s4-tools/lib/</code> directory (for a source release or checkout from git).</p></div>
+    </div>
+    <div id='footer'>
+      <div class='container'>
+        <span class='copyright'>Apache S4 - Copyright 2013 The Apache Software Foundation</span>
+      </div>
+    </div>
+  </body>
+</html>

Added: incubator/s4/site/doc/0.6.0/dev_tips/index.html
URL: http://svn.apache.org/viewvc/incubator/s4/site/doc/0.6.0/dev_tips/index.html?rev=1449398&view=auto
==============================================================================
--- incubator/s4/site/doc/0.6.0/dev_tips/index.html (added)
+++ incubator/s4/site/doc/0.6.0/dev_tips/index.html Sat Feb 23 19:45:18 2013
@@ -0,0 +1,118 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html>
+  <head>
+    <title>S4: Development tips</title>
+    <meta content='text/html; charset=utf-8' http-equiv='Content-Type' />
+    <meta content='A general-purpose distributed stream computing platform' name='description' />
+    <link href='/style/screen.css' media='screen' rel='stylesheet' type='text/css' />
+    <link href='/style/print.css' media='print' rel='stylesheet' type='text/css' />
+    <!--[if lt IE 9]>
+      <link href='/style/ie.css' media='screen' rel='stylesheet' type='text/css' />
+    <![endif]-->
+    <link href='/style/style.css' rel='stylesheet' type='text/css' />
+    <link href='/style/nav.css' rel='stylesheet' type='text/css' />
+    <script type='text/javascript'>
+        var _gaq = _gaq || [];
+        _gaq.push(['_setAccount', 'UA-19490961-1']);
+        _gaq.push(['_setDomainName', '.s4.io']);
+        _gaq.push(['_trackPageview']);
+        (function() {
+          var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+          ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+          var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+        })();
+      </script>
+  </head>
+  <body>
+    <div id='header'>
+      <div class='container'>
+        <div id='logo'>
+          <a href='/'>
+            <img src='/images/s4_test.png' />
+          </a>
+        </div>
+        <div id='navbar'><ul id='nav'>
+  <li>
+    <a href='/'>home</a>
+  </li>
+  <li>
+    <a href='https://cwiki.apache.org/confluence/display/S4/S4+Wiki'>doc [0.5]</a>
+  </li>
+  <li>
+    <a href='http://git-wip-us.apache.org/repos/asf?p=incubator-s4.git' onClick="_gaq.push(['_trackEvent', 'External', 'Apache Git', 'http://github.com/s4']);">code</a>
+  </li>
+  <li>
+    <a href='http://people.apache.org/~mmorel/apache-s4-0.5.0-incubating-doc/javadoc/'>API</a>
+  </li>
+  <li>
+    <a href='/contrib'>get involved</a>
+  </li>
+  <li>
+    <a href='/team'>team</a>
+  </li>
+  <li>
+    <a href='/download'>download</a>
+  </li>
+</ul></div>
+      </div>
+    </div>
+    <div id='wrapper'>
+      <div class='container' id='container'><p>Here are a few tips to ease the development of S4 applications.</p>
+
+<h3 id="import-an-s4-project-into-your-ide">Import an S4 project into your IDE</h3>
+
+<p>You can run <code>gradlew eclipse</code> or <code>gradlew idea</code> at the root of your S4 application directory. Then simply import the project into eclipse or intellij. You&#8217;ll have both your application classes <em>and</em> S4 libraries imported to the classpath of the project.</p>
+
+<p>In order to get the transitive dependencies of the platform included as well, you should:</p>
+
+<ul>
+  <li>Download a source distribution</li>
+  <li>
+    <p>Install S4 and its dependencies in your local maven repository</p>
+
+    <pre><code>  // from s4 source distribution root directory
+  ./gradlew install -DskipTests
+</code></pre>
+  </li>
+  <li>Then run <code>gradlew eclipse</code> or <code>gradlew idea</code></li>
+</ul>
+
+<h3 id="start-a-local-zookeeper-instance">Start a local Zookeeper instance</h3>
+
+<ul>
+  <li>
+    <p>Use the default test configuration (2 clusters with following configs: <code>c=testCluster1:flp=12000:nbTasks=1</code> and <code>c=testCluster2:flp=13000:nbTasks=1</code>)</p>
+
+    <pre><code>  s4 zkServer -t
+</code></pre>
+  </li>
+  <li>
+    <p>Start a Zookeeper instance with your custom configuration, e.g. with 1 partition:</p>
+
+    <pre><code>  s4 zkServer -clusters=c=testCluster1:flp=12000:nbTasks=1
+</code></pre>
+  </li>
+</ul>
+
+<h3 id="load-an-application-in-a-new-node-directly-from-an-ide">Load an application in a new node directly from an IDE</h3>
+
+<p>This allows to <em>skip the packaging phase!</em></p>
+
+<p>A requirement is that you have both the application classes and the S4 classes in your classpath. See above.</p>
+
+<p>Then you just need to run the <code>org.apache.s4.core.Main</code> class and pass:</p>
+
+<ul>
+  <li>the cluster name: <code>-c=testCluster1</code></li>
+  <li>the app class name: <code>-appClass=myAppClass</code></li>
+</ul>
+
+<p>If you use a local Zookeeper instance, there is no need to specify the <code>-zk</code> option.</p></div>
+    </div>
+    <div id='footer'>
+      <div class='container'>
+        <span class='copyright'>Apache S4 - Copyright 2013 The Apache Software Foundation</span>
+      </div>
+    </div>
+  </body>
+</html>

Added: incubator/s4/site/doc/0.6.0/event_dispatch/index.html
URL: http://svn.apache.org/viewvc/incubator/s4/site/doc/0.6.0/event_dispatch/index.html?rev=1449398&view=auto
==============================================================================
--- incubator/s4/site/doc/0.6.0/event_dispatch/index.html (added)
+++ incubator/s4/site/doc/0.6.0/event_dispatch/index.html Sat Feb 23 19:45:18 2013
@@ -0,0 +1,135 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html>
+  <head>
+    <title>S4: Event dispatch</title>
+    <meta content='text/html; charset=utf-8' http-equiv='Content-Type' />
+    <meta content='A general-purpose distributed stream computing platform' name='description' />
+    <link href='/style/screen.css' media='screen' rel='stylesheet' type='text/css' />
+    <link href='/style/print.css' media='print' rel='stylesheet' type='text/css' />
+    <!--[if lt IE 9]>
+      <link href='/style/ie.css' media='screen' rel='stylesheet' type='text/css' />
+    <![endif]-->
+    <link href='/style/style.css' rel='stylesheet' type='text/css' />
+    <link href='/style/nav.css' rel='stylesheet' type='text/css' />
+    <script type='text/javascript'>
+        var _gaq = _gaq || [];
+        _gaq.push(['_setAccount', 'UA-19490961-1']);
+        _gaq.push(['_setDomainName', '.s4.io']);
+        _gaq.push(['_trackPageview']);
+        (function() {
+          var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+          ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+          var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+        })();
+      </script>
+  </head>
+  <body>
+    <div id='header'>
+      <div class='container'>
+        <div id='logo'>
+          <a href='/'>
+            <img src='/images/s4_test.png' />
+          </a>
+        </div>
+        <div id='navbar'><ul id='nav'>
+  <li>
+    <a href='/'>home</a>
+  </li>
+  <li>
+    <a href='https://cwiki.apache.org/confluence/display/S4/S4+Wiki'>doc [0.5]</a>
+  </li>
+  <li>
+    <a href='http://git-wip-us.apache.org/repos/asf?p=incubator-s4.git' onClick="_gaq.push(['_trackEvent', 'External', 'Apache Git', 'http://github.com/s4']);">code</a>
+  </li>
+  <li>
+    <a href='http://people.apache.org/~mmorel/apache-s4-0.5.0-incubating-doc/javadoc/'>API</a>
+  </li>
+  <li>
+    <a href='/contrib'>get involved</a>
+  </li>
+  <li>
+    <a href='/team'>team</a>
+  </li>
+  <li>
+    <a href='/download'>download</a>
+  </li>
+</ul></div>
+      </div>
+    </div>
+    <div id='wrapper'>
+      <div class='container' id='container'><p>Events are dispatched according to their key.</p>
+
+<p>The key is identified in an <code>Event</code> through a <code>KeyFinder</code>.</p>
+
+<p>Dispatch can be configured for:
+* dispatching events to partitions (<em>outgoing dispatch</em>)
+* dispatching external events within a partition  (<em>incoming dispatch</em>)</p>
+
+<h1 id="outgoing-dispatch">Outgoing dispatch</h1>
+
+<p>A stream can be defined with a KeyFinder, as :</p>
+
+<p>Stream<topicevent> topicSeenStream = createStream("TopicSeen", new KeyFinder<topicevent>() {</topicevent></topicevent></p>
+
+<pre><code>    @Override
+    public List&lt;String&gt; get(final TopicEvent arg0) {
+        return ImmutableList.of(arg0.getTopic());
+    }
+}, topicCountAndReportPE);
+</code></pre>
+
+<p>When an event is sent to the &#8220;TopicSeen&#8221; stream, its key will be identified through the KeyFinder implementation, hashed and dispatched to the matching partition.</p>
+
+<p>The same logic applies when defining <em>output streams</em>.</p>
+
+<p>If we use an AdapterApp subclass, the <code>remoteStreamKeyFinder</code> should be defined in the <code>onInit()</code> method, <em>before</em> calling <code>super.onInit()</code>:</p>
+
+<pre><code>@Override
+protected void onInit() {
+... 
+remoteStreamKeyFinder = new KeyFinder&lt;Event&gt;() {
+
+            @Override
+            public List&lt;String&gt; get(Event event) {
+                return ImmutableList.of(event.get("theKeyField"));
+            }
+        };
+super.onInit()
+...
+</code></pre>
+
+<p>If we use a standard App, we use the <code>createOutputStream(String name, KeyFinder&lt;Event&gt; keyFinder)</code> method.</p>
+
+<p>bq. If the KeyFinder is not defined for the output streams, events are sent to partitions of the connected cluster in a round robin fashion.</p>
+
+<h1 id="incoming-dispatch-from-external-events">Incoming dispatch from external events</h1>
+
+<p>When receiving events from a remote application, we <em>must</em> define how external events are dispatched internally, to which PEs and based on which keys. For that purpose, we simply define and <em>input stream</em> with the corresponding KeyFinder:</p>
+
+<p>createInputStream(&#8220;names&#8221;, new KeyFinder<event>() {</event></p>
+
+<pre><code>@Override
+public List&lt;String&gt; get(Event event) {
+    return Arrays.asList(new String[] { event.get("name") });
+   }
+}, helloPE);
+</code></pre>
+
+<p>In this case, a name is extracted from each event, the PE instance with this key is retrieved or created, and the event sent to that instance.</p>
+
+<p>Alternatively, we can use a unique PE instance for processing events in a given node. For that we simply define the input stream without a KeyFinder, <em>and</em> use a singleton PE:</p>
+
+<pre><code>HelloPE helloPE = createPE(HelloPE.class);
+helloPE.setSingleton(true);
+createInputStream("names", helloPE);
+</code></pre>
+
+<p>In this case, all events will be dispatched to the only HelloPE instance in this partition, regardless of the content of the event.</p></div>
+    </div>
+    <div id='footer'>
+      <div class='container'>
+        <span class='copyright'>Apache S4 - Copyright 2013 The Apache Software Foundation</span>
+      </div>
+    </div>
+  </body>
+</html>

Added: incubator/s4/site/doc/0.6.0/fault_tolerance/index.html
URL: http://svn.apache.org/viewvc/incubator/s4/site/doc/0.6.0/fault_tolerance/index.html?rev=1449398&view=auto
==============================================================================
--- incubator/s4/site/doc/0.6.0/fault_tolerance/index.html (added)
+++ incubator/s4/site/doc/0.6.0/fault_tolerance/index.html Sat Feb 23 19:45:18 2013
@@ -0,0 +1,236 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html>
+  <head>
+    <title>S4: Fault tolerance</title>
+    <meta content='text/html; charset=utf-8' http-equiv='Content-Type' />
+    <meta content='A general-purpose distributed stream computing platform' name='description' />
+    <link href='/style/screen.css' media='screen' rel='stylesheet' type='text/css' />
+    <link href='/style/print.css' media='print' rel='stylesheet' type='text/css' />
+    <!--[if lt IE 9]>
+      <link href='/style/ie.css' media='screen' rel='stylesheet' type='text/css' />
+    <![endif]-->
+    <link href='/style/style.css' rel='stylesheet' type='text/css' />
+    <link href='/style/nav.css' rel='stylesheet' type='text/css' />
+    <script type='text/javascript'>
+        var _gaq = _gaq || [];
+        _gaq.push(['_setAccount', 'UA-19490961-1']);
+        _gaq.push(['_setDomainName', '.s4.io']);
+        _gaq.push(['_trackPageview']);
+        (function() {
+          var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+          ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+          var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+        })();
+      </script>
+  </head>
+  <body>
+    <div id='header'>
+      <div class='container'>
+        <div id='logo'>
+          <a href='/'>
+            <img src='/images/s4_test.png' />
+          </a>
+        </div>
+        <div id='navbar'><ul id='nav'>
+  <li>
+    <a href='/'>home</a>
+  </li>
+  <li>
+    <a href='https://cwiki.apache.org/confluence/display/S4/S4+Wiki'>doc [0.5]</a>
+  </li>
+  <li>
+    <a href='http://git-wip-us.apache.org/repos/asf?p=incubator-s4.git' onClick="_gaq.push(['_trackEvent', 'External', 'Apache Git', 'http://github.com/s4']);">code</a>
+  </li>
+  <li>
+    <a href='http://people.apache.org/~mmorel/apache-s4-0.5.0-incubating-doc/javadoc/'>API</a>
+  </li>
+  <li>
+    <a href='/contrib'>get involved</a>
+  </li>
+  <li>
+    <a href='/team'>team</a>
+  </li>
+  <li>
+    <a href='/download'>download</a>
+  </li>
+</ul></div>
+      </div>
+    </div>
+    <div id='wrapper'>
+      <div class='container' id='container'><p>Stream processing applications are typically long running applications, and they may accumulate state over extended periods of time.</p>
+
+<p>Running a distributed system over a long period of time implies there will be:</p>
+
+<ul>
+  <li>failures</li>
+  <li>infrastructure updates</li>
+  <li>scheduled restarts</li>
+  <li>application updates</li>
+</ul>
+
+<p>In each of these situations, some or all of S4 nodes will be shutdown. The system may therefore be partly unavailable, and in-memory state accumulated during the execution may be lost.</p>
+
+<p>In order to deal with this kind of situation, S4 provides:</p>
+
+<ul>
+  <li>high availability</li>
+  <li>state recovery (based on checkpointing)</li>
+  <li>while preserving low processing latency</li>
+</ul>
+
+<p>In this document, we first describe the high availability mechanism implemented in S4, then we describe the checkpointing and recovery mechanism, and how to customize it, then we describe future improvements.</p>
+
+<h1 id="fail-over-mechanism">Fail-over mechanism</h1>
+
+<p>In order to guarantee availability in the presence of sudden node failures, S4 provides a mechanism to automatically detect failed nodes and redirect messages to a standby node.</p>
+
+<p>The following figure illustrates this fail-over mechanism: </p>
+
+<p><img src="/images/doc/0.6.0/failover.png" alt="image" /></p>
+
+<p>This technique provides high availability but does not prevent state loss.</p>
+
+<h2 id="configuration">Configuration</h2>
+
+<h5 id="number-of-standby-nodes">Number of standby nodes</h5>
+
+<p>S4 clusters are defined with a fixed number of tasks (\~ partitions). If you have n partitions and start m nodes, with m&gt;n, you get m-n standby nodes.</p>
+
+<h5 id="failure-detection-timeout">Failure detection timeout</h5>
+
+<p>Zookeeper considers a node is dead when it cannot reach it after a the session timeout. The session timeout is specified by the client upon connection, and is at minimum twice the tickTime (heartbeat) specified in the Zookeeper ensemble configuration.</p>
+
+<h1 id="checkpointing-and-recovery">Checkpointing and recovery</h1>
+
+<h3 id="a-closer-look-at-the-problem">A closer look at the problem</h3>
+
+<p>Upon node crash, the fail-over mechanism brings a new and fresh node to the cluster. When this node is brought into the cluster, it has no state, no instantiated PE. Messages start arriving at this node, and trigger keyed PE instantiations.</p>
+
+<p>If there is no checkpointing and recovery mechanism, those PEs start with an empty state.</p>
+
+<h3 id="s4-strategy-for-solving-the-problem">S4 strategy for solving the problem</h3>
+
+<p>For PEs to recover a previous state, the technique we use is to:</p>
+
+<ul>
+  <li><em>periodically checkpoint</em> the state of PEs across the S4 cluster</li>
+  <li><em>lazily recover</em> (triggered by messages)</li>
+</ul>
+
+<p>This means that if there is a previous state that was checkpointed, and that a new PE is instantiated because a new key is seen, the PE instance will fetch the corresponding checkpoint, recover the corresponding state, and only then start processing events. State loss is minimal!</p>
+
+<h3 id="design">Design</h3>
+
+<h5 id="checkpointing">Checkpointing</h5>
+
+<p>In order to minimize the latency, checkpointing is <em>uncoordinated</em> and <em>asynchronous</em>.
+Uncoordinated checkpointing means that each checkpoint is taken independently, without aiming at global consistency.
+Asynchronous checkpointing aims at minimizing the impact on the event processing execution path.</p>
+
+<p>Taking a checkpoint is a 2 steps operations, both handled outside of the event processing path:</p>
+
+<ul>
+  <li>serialize the PE instance</li>
+  <li>save the serialized PE instance to remote storage</li>
+</ul>
+
+<p>The following figure shows the various components involved: the checkpointing framework handles the serialization and passes serialized state to a pluggable storage backend:</p>
+
+<p><img src="/images/doc/0.6.0/checkpointing-framework.png" alt="image" /></p>
+
+<h5 id="recovery">Recovery</h5>
+
+<p>In order to optimize the usage of resources, recovery is <em>lazy</em>, which means it only happens when necessary.
+When a message for a new key arrives in the recovered S4 node, a new PE instance is created, and the system tries to fetch a previous checkpoint from storage. If there is a previous state, it is copied to the newly created PE instance. (This implies deserializing a previous object and copying its fields).</p>
+
+<h3 id="configuration-and-customization">Configuration and customization</h3>
+
+<h5 id="requirements">Requirements</h5>
+
+<p>A PE can be checkpointed if:</p>
+
+<ul>
+  <li>the PE class provides an empty no-arg constructor (that restriction should be lifted in next releases)</li>
+  <li>it has non transient serializable fields (and by opposition, transient fields will never be checkpointed)</li>
+</ul>
+
+<h5 id="checkpointing-application-configuration">Checkpointing application configuration</h5>
+
+<p>Checkpointing intervals are defined per prototype, in time intervals or event counts (for now). This is specified in the application module, using API methods from the ProcessingElement class, and passing a CheckpointingConfiguration object. Please refer to the API documentation.</p>
+
+<p>The twitter example application shipped in the distribution is already configured for enabling checkpointing. See the <a href="https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=blob;f=test-apps/twitter-counter/src/main/java/org/apache/s4/example/twitter/TwitterCounterApp.java;h=5d7855fa5aee6cbe693fa47c1ebad03da316f42b">TwitterCounterApp</a> class.</p>
+
+<p>For instance, here is how to specify a checkpointing frequency of 20s on the TopNTopicPE prototype:</p>
+
+<pre><code>topNTopicPE.setCheckpointingConfig(new CheckpointingConfig.Builder(CheckpointingMode.TIME).frequency(20).timeUnit(TimeUnit.SECONDS).build());
+</code></pre>
+
+<h5 id="enabling-checkpointing">Enabling checkpointing</h5>
+
+<p>This is a node configuration. You need to inject a checkpointing module that speficies a CheckpointingFramework implementation (please use org.apache.s4.core.ft.SafeKeeper) and a backend storage implementation. The backend storage implements the StateStorage interface.</p>
+
+<p>We provide a default module (FileSystemBackendCheckpointingModule) that uses a file system backend (DefaultFileSystemStateStorage). It can be used with an NFS setup and introduces no dependency. You may use it by starting an S4 node in the following manner:</p>
+
+<pre><code>./s4 node -c=cluster1 -emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule
+</code></pre>
+
+<h5 id="customizing-the-checkpointing-backend">Customizing the checkpointing backend</h5>
+
+<p>It is quite straightforward to implement backends for other kinds of storage (key value stores, datagrid, cache, RDBMS). Using an alternative backend is as simple as providing a new module to the S4 node. Here is an example of a module using a &#8216;Cool&#8217; backend implementation:</p>
+
+<pre><code>public class CoolBackendCheckpointingModule extends AbstractModule {
+	@Override
+	protected void configure() {
+    	bind(StateStorage.class).to(CoolStateStorage.class);
+    	bind(CheckpointingFramework.class).to(SafeKeeper.class);
+	}
+}
+</code></pre>
+
+<h5 id="overriding-checkpointing-and-recovery-operations">Overriding checkpointing and recovery operations</h5>
+
+<p>By default, S4 uses <a href="http://code.google.com/p/kryo">kryo</a> to serialize and deserialize checkpoints, but it is possible to use a different mechanism, by overriding the <code>checkpoint()</code>, <code>serializeState()</code> and <code>restoreState()</code> methods of the <code>ProcessingElement</code> class.</p>
+
+<p>PEs are eligible for checkpointing when their state is &#8216;dirty&#8217;. The dirty flag is checked through the <code>isDirty()</code> method, and cleared by calling the <code>clearDirty()</code> method. In some cases, dependent on the application code, only some of the events may actually change the state of the PE. You should override these methods in order to avoid unjustified checkpointing operations.</p>
+
+<h5 id="tuning">Tuning</h5>
+
+<p>The checkpointing framework has a number of overridable parameters, mostly for sizing thread pools:</p>
+
+<ul>
+  <li>Serialization thread pool
+    <ul>
+      <li>s4.checkpointing.serializationMaxThreads (default = 1)</li>
+      <li>s4.checkpointing.serializationThreadKeepAliveSeconds (default = 120)</li>
+      <li>s4.checkpointing.serializationMaxOutstandingRequests (default = 1000)</li>
+    </ul>
+  </li>
+  <li>Storage backend thread pool
+    <ul>
+      <li>s4.checkpointing.storageMaxThreads (default = 1)</li>
+      <li>s4.checkpointing.storageThreadKeepAliveSeconds (default = 120)</li>
+      <li>s4.checkpointing.storageMaxOutstandingRequests (default = 1000)</li>
+    </ul>
+  </li>
+  <li>Fetching thread pool: fetching is a blocking operation, which can timeout:
+    <ul>
+      <li>s4.checkpointing.fetchingMaxThreads (default = 1)</li>
+      <li>s4.checkpointing.fetchingThreadKeepAliveSeconds (default = 120)</li>
+      <li>s4.checkpointing.fetchingMaxWaitMs (default = 1000) (corresponds to the timeout)</li>
+    </ul>
+  </li>
+  <li>In the case the backend is unresponsive, it can be bypassed:
+    <ul>
+      <li>s4.checkpointing.fetchingMaxConsecutiveFailuresBeforeDisabling (default = 10)</li>
+      <li>s4.checkpointing.fetchingDisabledDurationMs (default = 600000)</li>
+    </ul>
+  </li>
+</ul></div>
+    </div>
+    <div id='footer'>
+      <div class='container'>
+        <span class='copyright'>Apache S4 - Copyright 2013 The Apache Software Foundation</span>
+      </div>
+    </div>
+  </body>
+</html>

Added: incubator/s4/site/doc/0.6.0/index.html
URL: http://svn.apache.org/viewvc/incubator/s4/site/doc/0.6.0/index.html?rev=1449398&view=auto
==============================================================================
--- incubator/s4/site/doc/0.6.0/index.html (added)
+++ incubator/s4/site/doc/0.6.0/index.html Sat Feb 23 19:45:18 2013
@@ -0,0 +1,101 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html>
+  <head>
+    <title>S4: S4 0.6.0</title>
+    <meta content='text/html; charset=utf-8' http-equiv='Content-Type' />
+    <meta content='A general-purpose distributed stream computing platform' name='description' />
+    <link href='/style/screen.css' media='screen' rel='stylesheet' type='text/css' />
+    <link href='/style/print.css' media='print' rel='stylesheet' type='text/css' />
+    <!--[if lt IE 9]>
+      <link href='/style/ie.css' media='screen' rel='stylesheet' type='text/css' />
+    <![endif]-->
+    <link href='/style/style.css' rel='stylesheet' type='text/css' />
+    <link href='/style/nav.css' rel='stylesheet' type='text/css' />
+    <script type='text/javascript'>
+        var _gaq = _gaq || [];
+        _gaq.push(['_setAccount', 'UA-19490961-1']);
+        _gaq.push(['_setDomainName', '.s4.io']);
+        _gaq.push(['_trackPageview']);
+        (function() {
+          var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+          ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+          var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+        })();
+      </script>
+  </head>
+  <body>
+    <div id='header'>
+      <div class='container'>
+        <div id='logo'>
+          <a href='/'>
+            <img src='/images/s4_test.png' />
+          </a>
+        </div>
+        <div id='navbar'><ul id='nav'>
+  <li>
+    <a href='/'>home</a>
+  </li>
+  <li>
+    <a href='https://cwiki.apache.org/confluence/display/S4/S4+Wiki'>doc [0.5]</a>
+  </li>
+  <li>
+    <a href='http://git-wip-us.apache.org/repos/asf?p=incubator-s4.git' onClick="_gaq.push(['_trackEvent', 'External', 'Apache Git', 'http://github.com/s4']);">code</a>
+  </li>
+  <li>
+    <a href='http://people.apache.org/~mmorel/apache-s4-0.5.0-incubating-doc/javadoc/'>API</a>
+  </li>
+  <li>
+    <a href='/contrib'>get involved</a>
+  </li>
+  <li>
+    <a href='/team'>team</a>
+  </li>
+  <li>
+    <a href='/download'>download</a>
+  </li>
+</ul></div>
+      </div>
+    </div>
+    <div id='wrapper'>
+      <div class='container' id='container'><blockquote>
+  <p>This is the documentation for S4 0.6.0. For previous versions, please refer to the <a href="https://cwiki.apache.org/confluence/display/S4/S4+Wiki">wiki</a></p>
+</blockquote>
+
+<p>S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous, unbounded streams of data.</p>
+
+<h2 id="getting-started">Getting Started</h2>
+
+<ul>
+  <li>You may start with an <a href="overview">overview</a> of the platform</li>
+  <li>Then follow a <a href="walkthrough">walkthrough</a> for an hands-on introduction</li>
+  <li>And <a href="dev_tips">here</a> are some tips to ease the development process</li>
+</ul>
+
+<h2 id="configuration">Configuration</h2>
+
+<ul>
+  <li>How to <a href="configuration">customize the platform and pass configuration parameters</a></li>
+  <li>How to <a href="application_dependencies">add application dependencies</a></li>
+  <li>How to <a href="event_dispatch">dispatch events </a> within an application and between applications</li>
+</ul>
+
+<h2 id="features">Features</h2>
+
+<ul>
+  <li>Details about <a href="fault_tolerance">fault tolerance</a></li>
+</ul>
+
+<h2 id="troubleshooting">Troubleshooting</h2>
+
+<ul>
+  <li>Try the <a href="https://cwiki.apache.org/confluence/display/S4/FAQ">FAQ</a></li>
+  <li>Try the <a href="https://cwiki.apache.org/S4/s4-apache-mailing-lists.html">mailing lists</a></li>
+</ul></div>
+    </div>
+    <div id='footer'>
+      <div class='container'>
+        <span class='copyright'>Apache S4 - Copyright 2013 The Apache Software Foundation</span>
+      </div>
+    </div>
+  </body>
+</html>

Added: incubator/s4/site/doc/0.6.0/overview/index.html
URL: http://svn.apache.org/viewvc/incubator/s4/site/doc/0.6.0/overview/index.html?rev=1449398&view=auto
==============================================================================
--- incubator/s4/site/doc/0.6.0/overview/index.html (added)
+++ incubator/s4/site/doc/0.6.0/overview/index.html Sat Feb 23 19:45:18 2013
@@ -0,0 +1,172 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html>
+  <head>
+    <title>S4: S4 0.6.0 overview</title>
+    <meta content='text/html; charset=utf-8' http-equiv='Content-Type' />
+    <meta content='A general-purpose distributed stream computing platform' name='description' />
+    <link href='/style/screen.css' media='screen' rel='stylesheet' type='text/css' />
+    <link href='/style/print.css' media='print' rel='stylesheet' type='text/css' />
+    <!--[if lt IE 9]>
+      <link href='/style/ie.css' media='screen' rel='stylesheet' type='text/css' />
+    <![endif]-->
+    <link href='/style/style.css' rel='stylesheet' type='text/css' />
+    <link href='/style/nav.css' rel='stylesheet' type='text/css' />
+    <script type='text/javascript'>
+        var _gaq = _gaq || [];
+        _gaq.push(['_setAccount', 'UA-19490961-1']);
+        _gaq.push(['_setDomainName', '.s4.io']);
+        _gaq.push(['_trackPageview']);
+        (function() {
+          var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+          ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+          var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+        })();
+      </script>
+  </head>
+  <body>
+    <div id='header'>
+      <div class='container'>
+        <div id='logo'>
+          <a href='/'>
+            <img src='/images/s4_test.png' />
+          </a>
+        </div>
+        <div id='navbar'><ul id='nav'>
+  <li>
+    <a href='/'>home</a>
+  </li>
+  <li>
+    <a href='https://cwiki.apache.org/confluence/display/S4/S4+Wiki'>doc [0.5]</a>
+  </li>
+  <li>
+    <a href='http://git-wip-us.apache.org/repos/asf?p=incubator-s4.git' onClick="_gaq.push(['_trackEvent', 'External', 'Apache Git', 'http://github.com/s4']);">code</a>
+  </li>
+  <li>
+    <a href='http://people.apache.org/~mmorel/apache-s4-0.5.0-incubating-doc/javadoc/'>API</a>
+  </li>
+  <li>
+    <a href='/contrib'>get involved</a>
+  </li>
+  <li>
+    <a href='/team'>team</a>
+  </li>
+  <li>
+    <a href='/download'>download</a>
+  </li>
+</ul></div>
+      </div>
+    </div>
+    <div id='wrapper'>
+      <div class='container' id='container'><h1 id="what-is-s4">What is S4?</h1>
+
+<p>S4 is a general-purpose,near real-time, distributed, decentralized, scalable, event-driven, modular platform that allows programmers to easily implement applications for processing continuous unbounded streams of data.</p>
+
+<p>S4 0.5 focused on providing a functional complete refactoring.</p>
+
+<p>S4 0.6 builds on this basis and brings plenty of exciting features, in particular:</p>
+
+<ul>
+  <li><em>performance improvements</em>: stream throughput improved by 1000 % (~200k messages / s / stream)</li>
+  <li>improved [configurability](S4:Configuration - 0.6.0], for both the S4 platform and deployed applications</li>
+  <li><em>elasticity</em> and fine partition tuning, through an integration with Apache Helix</li>
+</ul>
+
+<h1 id="what-are-the-cool-features">What are the cool features?</h1>
+
+<p><strong>Flexible deployment</strong>:</p>
+
+<ul>
+  <li>By default keys are homogeneously sparsed over the cluster: helps balance the load, especially for fine grained partitioning</li>
+  <li>S4 also provides fine control over the partitioning</li>
+  <li>Features automatic rebalancing</li>
+</ul>
+
+<p><strong>Modular design</strong>:</p>
+
+<ul>
+  <li>both the platform and the applications are built by dependency injection, and configured through independent modules.</li>
+  <li>makes it easy to customize the system according to specific requirements</li>
+  <li>pluggable event serving policies: load shedding, throttling, blocking</li>
+</ul>
+
+<p><strong>Dynamic and loose coupling of S4 applications</strong>:</p>
+
+<ul>
+  <li>through a pub-sub mechanism</li>
+  <li>makes it easy to:
+** assemble subsystems into larger systems
+** reuse applications
+** separate pre-processing
+** provision, control and update subsystems independently</li>
+</ul>
+
+<p><strong><a href="fault_tolerance">Fault tolerant</a></strong></p>
+
+<ul>
+  <li><em>Fail-over</em> mechanism for high availability</li>
+  <li><em>Checkpointing and recovery</em> mechanism for minimizing state loss</li>
+</ul>
+
+<p><strong>Pure Java</strong>: statically typed, easy to understand, to refactor, and to extend</p>
+
+<h1 id="how-does-it-work">How does it work?</h1>
+
+<h2 id="some-definitions">Some definitions</h2>
+
+<p><strong>Platform</strong></p>
+
+<ul>
+  <li>S4 provides a runtime distributed platform that handles communication, scheduling and distribution across containers.</li>
+  <li>Distributed containers are called <em>S4 nodes</em></li>
+  <li>S4 nodes are deployed on <em>S4 clusters</em></li>
+  <li>S4 clusters define named ensembles of S4 nodes, with a fixed size</li>
+  <li>The size of an S4 cluster corresponds to the number of logical <em>partitions</em> (sometimes referred to as <em>tasks</em>)</li>
+</ul>
+
+<p><strong>Applications</strong></p>
+
+<ul>
+  <li>Users develop applications and deploy them on S4 clusters</li>
+  <li>
+    <p>Applications are built from:
+** <em>Processing elements</em> (PEs)
+** <em>Streams</em> that interconnect PEs</p>
+  </li>
+  <li>PEs communicate asynchronously by sending <em>events</em> on streams.</li>
+  <li>Events are dispatched to nodes according to their key</li>
+</ul>
+
+<p><strong>External streams</strong> are a special kind of stream that:</p>
+
+<ul>
+  <li>send events outside of the application</li>
+  <li>receive events from external sources</li>
+  <li>can interconnect and assemble applications into larger systems.</li>
+</ul>
+
+<p><strong>Adapters</strong> are S4 applications that can convert external streams into streams of S4 events. Since adapters are also S4 applications, they can be scaled easily.</p>
+
+<h2 id="a-hierarchical-perspective-on-s4">A hierarchical perspective on S4</h2>
+
+<p>The following diagram sums-up the key concepts in a hierarchical fashion:</p>
+
+<p><img src="/images/doc/0.6.0/S4_hierarchical_archi.png" alt="image" /></p>
+
+<h1 id="where-can-i-find-more-information">Where can I find more information?</h1>
+
+<ul>
+  <li><a href="http://incubator.apache.org/s4/">The website</a> is a good starting point.</li>
+  <li><a href="https://cwiki.apache.org/confluence/display/S4/">The wiki</a> currently contains the most up-to-date information: general information (this page), configuration, examples.</li>
+  <li>Questions can be asked through the <a href="https://cwiki.apache.org/confluence/display/S4/S4+Apache+mailing+lists">mailing lists</a></li>
+  <li>The source code is available throught [git](https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git], <a href="http://incubator.apache.org/s4/contrib/">here</a> are instructions for fetching the code.</li>
+  <li>A nice set of <a href="http://www.slideshare.net/leoneu/20111104-s4-overview">slides</a> was used for a presentation at Stanford in November 2011.</li>
+  <li>The driving ideas are detailed in a <a href="http://www.4lunas.org/pub/2010-s4.pdf">conference publication</a> from KDCloud&#8217;11 (joint workshop with ICDM&#8217;11)</li>
+</ul></div>
+    </div>
+    <div id='footer'>
+      <div class='container'>
+        <span class='copyright'>Apache S4 - Copyright 2013 The Apache Software Foundation</span>
+      </div>
+    </div>
+  </body>
+</html>

Added: incubator/s4/site/doc/0.6.0/walkthrough/index.html
URL: http://svn.apache.org/viewvc/incubator/s4/site/doc/0.6.0/walkthrough/index.html?rev=1449398&view=auto
==============================================================================
--- incubator/s4/site/doc/0.6.0/walkthrough/index.html (added)
+++ incubator/s4/site/doc/0.6.0/walkthrough/index.html Sat Feb 23 19:45:18 2013
@@ -0,0 +1,462 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html>
+  <head>
+    <title>S4: Walkthrough</title>
+    <meta content='text/html; charset=utf-8' http-equiv='Content-Type' />
+    <meta content='A general-purpose distributed stream computing platform' name='description' />
+    <link href='/style/screen.css' media='screen' rel='stylesheet' type='text/css' />
+    <link href='/style/print.css' media='print' rel='stylesheet' type='text/css' />
+    <!--[if lt IE 9]>
+      <link href='/style/ie.css' media='screen' rel='stylesheet' type='text/css' />
+    <![endif]-->
+    <link href='/style/style.css' rel='stylesheet' type='text/css' />
+    <link href='/style/nav.css' rel='stylesheet' type='text/css' />
+    <script type='text/javascript'>
+        var _gaq = _gaq || [];
+        _gaq.push(['_setAccount', 'UA-19490961-1']);
+        _gaq.push(['_setDomainName', '.s4.io']);
+        _gaq.push(['_trackPageview']);
+        (function() {
+          var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+          ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+          var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+        })();
+      </script>
+  </head>
+  <body>
+    <div id='header'>
+      <div class='container'>
+        <div id='logo'>
+          <a href='/'>
+            <img src='/images/s4_test.png' />
+          </a>
+        </div>
+        <div id='navbar'><ul id='nav'>
+  <li>
+    <a href='/'>home</a>
+  </li>
+  <li>
+    <a href='https://cwiki.apache.org/confluence/display/S4/S4+Wiki'>doc [0.5]</a>
+  </li>
+  <li>
+    <a href='http://git-wip-us.apache.org/repos/asf?p=incubator-s4.git' onClick="_gaq.push(['_trackEvent', 'External', 'Apache Git', 'http://github.com/s4']);">code</a>
+  </li>
+  <li>
+    <a href='http://people.apache.org/~mmorel/apache-s4-0.5.0-incubating-doc/javadoc/'>API</a>
+  </li>
+  <li>
+    <a href='/contrib'>get involved</a>
+  </li>
+  <li>
+    <a href='/team'>team</a>
+  </li>
+  <li>
+    <a href='/download'>download</a>
+  </li>
+</ul></div>
+      </div>
+    </div>
+    <div id='wrapper'>
+      <div class='container' id='container'><blockquote>
+  <p>Improvements from S4 0.5.0 include a more convenient configuration system, illustrated here: all platform and application parameters are specified when configuring/deploying the app.</p>
+</blockquote>
+
+<h1 id="install-s4">Install S4</h1>
+
+<p>There are 2 ways:</p>
+
+<ul>
+  <li><a href="http://incubator.apache.org/s4/download/">Download</a> the 0.6.0 release </li>
+</ul>
+
+<blockquote>
+  <p>We recommend getting the &#8220;source&#8221; release and building it</p>
+</blockquote>
+
+<ul>
+  <li>or checkout from the Apache git repository, by following the <a href="/contrib">instructions</a>. The 0.6.0 tag corresponds to the current release.</li>
+</ul>
+
+<p>If you get the binary release, s4 scripts are immediately available. Otherwise you must build the project:</p>
+
+<ul>
+  <li>
+    <p>Compile and install S4 in the local maven repository: (you can also let the tests run without the -DskipTests option)</p>
+
+    <pre><code>  S4:incubator-s4$ ./gradlew install -DskipTests
+  .... verbose logs ...
+</code></pre>
+  </li>
+  <li>
+    <p>Build the startup scripts:</p>
+
+    <pre><code>  S4:incubator-s4$ ./gradlew s4-tools:installApp
+  .... verbose logs 
+  ...:s4-tools:installApp
+</code></pre>
+  </li>
+</ul>
+
+<hr />
+
+<h1 id="start-a-new-application">Start a new application</h1>
+
+<p>S4 provides some scripts in order to simplify development and testing of applications. Let&#8217;s see how to create a new project and start a sample application.</p>
+
+<h2 id="create-a-new-project">Create a new project</h2>
+
+<ul>
+  <li>
+    <p>Create a new application template (here, we create it in the /tmp directory):</p>
+
+    <pre><code>  S4:incubator-s4$ ./s4 newApp myApp -parentDir=/tmp
+  ... some instructions on how to start ...
+</code></pre>
+  </li>
+  <li>
+    <p>This creates a sample application in the specified directory, with the following structure:</p>
+
+    <pre><code>  build.gradle  --&gt; the template build file, that you'll need to customize
+  gradlew --&gt; references the gradlew script from the S4 installation
+  s4 --&gt; references the s4 script from the S4 installation, and adds an "adapter" task
+  src/ --&gt; sources (maven-like structure)
+</code></pre>
+  </li>
+</ul>
+
+<h2 id="have-a-look-at-the-sample-project-content">Have a look at the sample project content</h2>
+
+<p>The src/main/java/hello directory contains 3 files:</p>
+
+<ul>
+  <li>
+    <p>HelloPE.java : a very simple PE that simply prints the name contained in incoming events
+  // ProcessingElement provides integration with the S4 platform
+  public class HelloPE extends ProcessingElement {</p>
+
+    <pre><code>  // you should define downstream streams here and inject them in the app definition
+
+  // PEs can maintain some state
+  boolean seen = false;
+
+  // This method is called upon a new Event on an incoming stream.
+  // You may overload it for handling instances of your own specialized subclasses of Event
+  public void onEvent(Event event) {
+      System.out.println("Hello " + (seen ? "again " : "") + event.get("name") + "!");
+      seen = true;
+  }
+  // skipped remaining methods
+</code></pre>
+  </li>
+  <li>
+    <p>HelloApp.java: defines a simple application: exposes an input stream (&#8220;names&#8221;), connected to the HelloPE. See <a href="event_dispatch">the event dispatch configuration page</a> for more information about how events are dispatched.
+  // App parent class provides integration with the S4 platform
+  public class HelloApp extends App {</p>
+
+    <pre><code>  @Override
+  protected void onStart() {
+  }
+
+  @Override
+  protected void onInit() {
+      // That's where we define PEs and streams
+      // create a prototype
+      HelloPE helloPE = createPE(HelloPE.class);
+      // Create a stream that listens to the "lines" stream and passes events to the helloPE instance.
+      createInputStream("names", new KeyFinder&lt;Event&gt;() {
+              // the KeyFinder is used to identify keys
+          @Override
+          public List&lt;String&gt; get(Event event) {
+              return Arrays.asList(new String[] { event.get("name") });
+          }
+      }, helloPE);
+  }   // skipped remaining methods
+</code></pre>
+  </li>
+  <li>
+    <p>HelloInputAdapter is a simple adapter that reads character lines from a socket, converts them into events, and sends the events to interested S4 apps, through the &#8220;names&#8221; stream</p>
+  </li>
+</ul>
+
+<h2 id="run-the-sample-app">Run the sample app</h2>
+
+<p>In order to run an S4 application, you need :</p>
+
+<ul>
+  <li>to set-up a cluster: provision a cluster and start S4 nodes for that cluster</li>
+  <li>to package the app</li>
+  <li>to publish the app on the cluster</li>
+</ul>
+
+<h1 id="set-up-the-cluster">Set-up the cluster:</h1>
+
+<ul>
+  <li>
+    <p>In 2 steps:</p>
+
+    <ol>
+      <li>
+        <p>Start a Zookeeper server instance (-clean option removes previous ZooKeeper data, if any):</p>
+
+        <pre><code> S4:incubator-s4$ ./s4 zkServer - clean
+ S4:myApp$ calling referenced s4 script : /Users/S4/tmp/incubator-s4/s4
+ [main] INFO  org.apache.s4.tools.ZKServer - Starting zookeeper server on port [2181]
+ [main] INFO  org.apache.s4.tools.ZKServer - cleaning existing data in [/var/folders/8V/8VdgKWU3HCiy2yV4dzFpDk+++TI/-Tmp-/tmp/zookeeper/data] and [/var/folders/8V/8VdgKWU3HCiy2yV4dzFpDk+++TI/-Tmp-/tmp/zookeeper/log]
+</code></pre>
+      </li>
+      <li>
+        <p>Define a new cluster. Say a cluster named &#8220;cluster1&#8221; with 2 partitions, nodes listening to ports starting from 12000:</p>
+
+        <pre><code> S4:myApp$ ./s4 newCluster -c=cluster1 -nbTasks=2 -flp=12000
+ calling referenced s4 script : /Users/S4/tmp/incubator-s4/s4
+ [main] INFO  org.apache.s4.tools.DefineCluster - preparing new cluster [cluster1] with [2] node(s)
+ [main] INFO  org.apache.s4.tools.DefineCluster - New cluster configuration uploaded into zookeeper
+</code></pre>
+      </li>
+    </ol>
+  </li>
+  <li>
+    <p>Alternatively you may combine these two steps into a single one, by passing the cluster configuration inline with the <code>zkServer</code> command:</p>
+
+    <pre><code>  S4:incubator-s4$ ./s4 zkServer -clusters=c=cluster1:flp=12000:nbTasks=2 -clean
+</code></pre>
+  </li>
+  <li>
+    <p>Start 2 S4 nodes with the default configuration, and attach them to cluster &#8220;cluster1&#8221; :</p>
+
+    <pre><code>  S4:myApp$ ./s4 node -c=cluster1
+  calling referenced s4 script : /Users/S4/tmp/incubator-s4/s4
+  15:50:18.996 [main] INFO  org.apache.s4.core.Main - Initializing S4 node with :
+  - comm module class [org.apache.s4.comm.DefaultCommModule]
+  - comm configuration file [default.s4.comm.properties from classpath]
+  - core module class [org.apache.s4.core.DefaultCoreModule]
+  - core configuration file[default.s4.core.properties from classpath]
+  -extra modules: []
+  [main] INFO  org.apache.s4.core.Main - Starting S4 node. This node will automatically download applications published for the cluster it belongs to and again (maybe in another shell):
+	
+  S4:myApp$ ./s4 node -c=cluster1
+</code></pre>
+  </li>
+  <li>Build, package and publish the app to cluster1:
+    <ul>
+      <li>This is done in 2 separate steps:
+        <ol>
+          <li>
+            <p>Create an s4r archive. The following creates an archive named myApp.s4r (here you may specify an arbitrary name) in build/libs.
+Again specifying the app class is optional : </p>
+
+            <pre><code> ./s4 s4r -a=hello.HelloApp -b=`pwd`/build.gradle myApp
+</code></pre>
+          </li>
+          <li>
+            <p>Publish the s4r archive (you may first copy it to a more adequate place). The name of the app is arbitrary: </p>
+
+            <pre><code> ./s4 deploy -s4r=`pwd`/build/libs/myApp.s4r -c=cluster1 -appName=myApp
+</code></pre>
+          </li>
+        </ol>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>S4 nodes will detect the new application, download it, load it and start it. You will get something like:</p>
+
+    <pre><code>  [ZkClient-EventThread-15-localhost:2181] INFO  o.a.s.d.DistributedDeploymentManager - Detected new application(s) to deploy {}[myApp]
+  [ZkClient-EventThread-15-localhost:2181] INFO  org.apache.s4.core.Server - Local app deployment: using s4r file name [myApp] as application name
+  [ZkClient-EventThread-15-localhost:2181] INFO  org.apache.s4.core.Server - App class name is: hello.HelloApp
+  [ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to { nbNodes=0,name=unknown,mode=unicast,type=,nodes=[]} from null
+  [ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClusterFromZK - Adding topology change listener:org.apache.s4.comm.tcp.TCPEmitter@79b2591c
+  [ZkClient-EventThread-15-localhost:2181] INFO  o.a.s.comm.topology.AssignmentFromZK - New session:87684175268872203; state is : SyncConnected
+  [ZkClient-EventThread-19-localhost:2181] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to { nbNodes=1,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=myMachine.myNetwork,taskId=Task-0}]} from { nbNodes=0,name=unknown,mode=unicast,type=,nodes=[]}
+  [ZkClient-EventThread-15-localhost:2181] INFO  o.a.s.comm.topology.AssignmentFromZK - Successfully acquired task:Task-1 by myMachine.myNetwork
+  [ZkClient-EventThread-19-localhost:2181] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to { nbNodes=2,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=myMachine.myNetwork,taskId=Task-0}, {partition=1,port=12001,machineName=myMachine.myNetwork,taskId=Task-1}]} from { nbNodes=1,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=myMachine.myNetwork,taskId=Task-0}]}
+  [ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClustersFromZK - New session:87684175268872205
+  [ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClustersFromZK - Detected new stream [names]
+  [ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClustersFromZK - New session:87684175268872206
+  [ZkClient-EventThread-15-localhost:2181] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to { nbNodes=2,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=myMachine.myNetwork,taskId=Task-0}, {partition=1,port=12001,machineName=myMachine.myNetwork,taskId=Task-1}]} from null
+  [ZkClient-EventThread-15-localhost:2181] INFO  org.apache.s4.core.Server - Loaded application from file /tmp/deploy-test/cluster1/myApp.s4r
+  [ZkClient-EventThread-15-localhost:2181] INFO  o.a.s.d.DistributedDeploymentManager - Successfully installed application myApp
+  [ZkClient-EventThread-15-localhost:2181] DEBUG o.a.s.c.g.OverloadDispatcherGenerator - Dumping generated overload dispatcher class for PE of class [class hello.HelloPE]
+  [ZkClient-EventThread-15-localhost:2181] DEBUG o.a.s4.comm.topology.ClustersFromZK - Adding input stream [names] for app [-1] in cluster [cluster1]
+  [ZkClient-EventThread-15-localhost:2181] INFO  org.apache.s4.core.App - Init prototype [hello.HelloPE].
+</code></pre>
+  </li>
+</ul>
+
+<p>Great! The application is now deployed on 2 S4 nodes.</p>
+
+<p>You can check the status of the application, nodes and streams with the &#8220;status&#8221; command:</p>
+
+<pre><code>./s4 status
+</code></pre>
+
+<p>Now what we need is some input!</p>
+
+<p>We can get input through an adapter, i.e. an S4 app that converts an external stream into S4 events, and injects the events into S4 clusters. In the sample application, the adapter is a very basic class, that extends App, listens to an input socket on port 15000, and converts each received line of characters into a generic S4 event, in which the line data is kept in a &#8220;name&#8221; field. We specify :</p>
+
+<ul>
+  <li>the adapter class</li>
+  <li>the name of the output stream</li>
+  <li>the cluster where to deploy this app</li>
+</ul>
+
+<p>For easy testing, we provide a facility to start a node with an adapter app without having to package the adapter app.</p>
+
+<ul>
+  <li>
+    <p>First, we need to define a new S4 subcluster for that app:</p>
+
+    <pre><code>  S4:myApp$ ./s4 newCluster -c=cluster2 -nbTasks=1 -flp=13000
+</code></pre>
+  </li>
+  <li>Then we configure the application:
+    <ul>
+      <li>we specify the adapter class (app class)
+ 	* we use &#8220;names&#8221; for identifying the output stream (this is the same name used as input by the myApp app)</li>
+      <li>
+        <p><em>there is also a -s4r parameter, indicating where to fetch the application package from. We don&#8217;t need it here, since we skip that step and use a special &#8220;adapter&#8221; tool</em></p>
+
+        <pre><code>  ./s4 deploy -appClass=hello.HelloInputAdapter -p=s4.adapter.output.stream=names -c=cluster2 -appName=adapter
+</code></pre>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Then we simply start the adapter (there is no packaging and copying of the S4R package)
+&gt; The adapter command must be run from the root of your S4 project (myApp dir in our case).</p>
+
+    <pre><code>  ./s4 adapter -c=cluster2
+</code></pre>
+  </li>
+  <li>
+    <p>Now let&#8217;s just provide some data to the external stream (our adapter is listening to port 15000):</p>
+
+    <pre><code>  S4:~$ echo "Bob" | nc localhost 15000
+</code></pre>
+  </li>
+  <li>
+    <p>One of the nodes should output in its console:</p>
+
+    <pre><code>  Hello Bob!
+</code></pre>
+  </li>
+</ul>
+
+<blockquote>
+  <p>If you keep sending messages, nodes will alternatively display the &#8220;hello&#8221; messages because the adapter app sends keyless events on the &#8220;names&#8221; stream in a round-robin fashion by default.</p>
+</blockquote>
+
+<h2 id="what-happened">What happened?</h2>
+
+<p>The following figures illustrate the various steps we have taken. The local file system is used as the S4 application repository in our example.</p>
+
+<p><img src="/images/doc/0.6.0/sampleAppDeployment.png" alt="image" /></p>
+
+<hr />
+
+<h1 id="run-the-twitter-trending-example">Run the Twitter trending example</h1>
+
+<p>Let&#8217;s have a look at another application, that computes trendy Twitter topics by listening to the spritzer stream from the Twitter API. This application was adapted from a previous example in S4 0.3.</p>
+
+<h2 id="overview">Overview</h2>
+
+<p>This application is divided into:</p>
+
+<ul>
+  <li>twitter-counter , in test-apps/twitter-counter/ : extracts topics from tweets and maintains a count of the most popular ones, periodically dumped to disk</li>
+  <li>twitter-adapter, in test-apps/twitter-adapter/ : listens to the feed from Twitter, converts status text into S4 events, and passes them to the &#8220;RawStatus&#8221; stream</li>
+</ul>
+
+<p>Have a look at the code in these directories. You&#8217;ll note that:</p>
+
+<ul>
+  <li>the build.gradle file must be tailored to include new dependencies (twitter4j libs in twitter-adapter)</li>
+  <li>events are partitioned through various keys</li>
+</ul>
+
+<h2 id="run-it">Run it!</h2>
+
+<blockquote>
+  <p>Note: You need a twitter4j.properties file in your home directory with the following content (debug is optional):</p>
+</blockquote>
+
+<pre><code>	debug=true
+	user=&lt;a twitter username&gt;
+	password=&lt;matching password&gt;
+</code></pre>
+
+<ul>
+  <li>
+    <p>Start a Zookeeper instance. From the S4 base directory, do:</p>
+
+    <pre><code>  ./s4 zkServer
+</code></pre>
+  </li>
+  <li>
+    <p>Define 2 clusters : 1 for deploying the twitter-counter app, and 1 for the adapter app</p>
+
+    <pre><code>  ./s4 newCluster -c=cluster1 -nbTasks=2 -flp=12000; ./s4 newCluster -c=cluster2 -nbTasks=1 -flp=13000
+</code></pre>
+  </li>
+  <li>
+    <p>Start 2 app nodes (you may want to start each node in a separate console) :</p>
+
+    <pre><code>  ./s4 node -c=cluster1
+  ./s4 node -c=cluster1
+</code></pre>
+  </li>
+  <li>
+    <p>Start 1 node for the adapter app:</p>
+
+    <pre><code>  ./s4 node -c=cluster2 -p=s4.adapter.output.stream=RawStatus
+</code></pre>
+  </li>
+  <li>
+    <p>Deploy twitter-counter app (you may also first build the s4r then publish it, as described in the previous section)</p>
+
+    <pre><code>  ./s4 deploy -appName=twitter-counter -c=cluster1 -b=`pwd`/test-apps/twitter-counter/build.gradle
+</code></pre>
+  </li>
+  <li>
+    <p>Deploy twitter-adapter app. In this example, we don&#8217;t directly specify the app class of the adapter, we use the deployment approach for apps (remember, the adapter is also an app).</p>
+
+    <pre><code>  ./s4 deploy -appName=twitter-adapter -c=cluster2 -b=`pwd`/test-apps/twitter-adapter/build.gradle
+</code></pre>
+  </li>
+  <li>
+    <p>Observe the current 10 most popular topics in file TopNTopics.txt. The file gets updated at regular intervals, and only outputs topics with a minimum of 10 occurrences, so you may have to wait a little before the file is updated :</p>
+
+    <pre><code>  tail -f TopNTopics.txt
+</code></pre>
+  </li>
+  <li>
+    <p>You may also check the status of the S4 node with:</p>
+
+    <pre><code>  ./s4 status
+</code></pre>
+  </li>
+</ul>
+
+<hr />
+
+<h1 id="what-next">What next?</h1>
+
+<p>You have now seen some basics applications, and you know how to run them, and how to get events into the system. You may now try to code your own apps with your own data.</p>
+
+<p><a href="../application_dependencies">This page</a> will help for specifying your own dependencies.</p>
+
+<p>There are more parameters available for the scripts (typing the name of the task will list the options). In particular, if you want distributed deployments, you&#8217;ll need to pass the Zookeeper connection strings when you start the nodes.</p>
+
+<p>You may also customize the communication and the core layers of S4 by tweaking configuration files and modules.</p>
+
+<p>Last, the <a href="http://people.apache.org/~mmorel/apache-s4-0.6.0-incubating-doc/javadoc/">javadoc</a> will help you when writing applications.</p>
+
+<p>We hope this will help you start rapidly, and remember: we&#8217;re happy to help!</p></div>
+    </div>
+    <div id='footer'>
+      <div class='container'>
+        <span class='copyright'>Apache S4 - Copyright 2013 The Apache Software Foundation</span>
+      </div>
+    </div>
+  </body>
+</html>

Added: incubator/s4/site/images/doc/0.6.0/S4_hierarchical_archi.png
URL: http://svn.apache.org/viewvc/incubator/s4/site/images/doc/0.6.0/S4_hierarchical_archi.png?rev=1449398&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/s4/site/images/doc/0.6.0/S4_hierarchical_archi.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/s4/site/images/doc/0.6.0/checkpointing-framework.png
URL: http://svn.apache.org/viewvc/incubator/s4/site/images/doc/0.6.0/checkpointing-framework.png?rev=1449398&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/s4/site/images/doc/0.6.0/checkpointing-framework.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/s4/site/images/doc/0.6.0/failover.png
URL: http://svn.apache.org/viewvc/incubator/s4/site/images/doc/0.6.0/failover.png?rev=1449398&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/s4/site/images/doc/0.6.0/failover.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/s4/site/images/doc/0.6.0/s4_node_layers.png
URL: http://svn.apache.org/viewvc/incubator/s4/site/images/doc/0.6.0/s4_node_layers.png?rev=1449398&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/s4/site/images/doc/0.6.0/s4_node_layers.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/s4/site/images/doc/0.6.0/sampleAppDeployment.png
URL: http://svn.apache.org/viewvc/incubator/s4/site/images/doc/0.6.0/sampleAppDeployment.png?rev=1449398&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/s4/site/images/doc/0.6.0/sampleAppDeployment.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/s4/site/images/doc/0.6.0/sources/s4_node_layers.odg
URL: http://svn.apache.org/viewvc/incubator/s4/site/images/doc/0.6.0/sources/s4_node_layers.odg?rev=1449398&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/s4/site/images/doc/0.6.0/sources/s4_node_layers.odg
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream