You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by cr...@apache.org on 2013/09/02 18:35:12 UTC

svn commit: r1519471 [3/3] - in /incubator/samza/site: ./ contribute/ learn/documentation/0.7.0/api/ learn/documentation/0.7.0/api/javadocs/ learn/documentation/0.7.0/api/javadocs/org/apache/samza/ learn/documentation/0.7.0/api/javadocs/org/apache/samz...

Modified: incubator/samza/site/learn/documentation/0.7.0/container/task-runner.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/task-runner.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/task-runner.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/task-runner.html Mon Sep  2 16:35:10 2013
@@ -65,6 +65,8 @@
         <div class="body">
           <h2>TaskRunner</h2>
 
+<!-- TODO: Is TaskRunner still appropriate terminology to use (appears to be a combo of SamzaContainer and TaskInstance in the code)? -->
+
 <p>The TaskRunner is Samza&#39;s stream processing container. It is responsible for managing the startup, execution, and shutdown of one or more StreamTask instances.</p>
 
 <p>When the a TaskRunner starts up, it does the following:</p>
@@ -85,8 +87,8 @@
 <h3>Tasks and Partitions</h3>
 
 <p>When the TaskRunner starts, it creates an instance of the StreamTask that you&#39;ve written. If the StreamTask implements the InitableTask interface, the TaskRunner will also call the init() method.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">public interface InitableTask {
-  void init(Config config, TaskContextPartition context);
+<div class="highlight"><pre><code class="text language-text">public interface InitableTask {
+  void init(Config config, TaskContext context);
 }
 </code></pre></div>
 <p>It doesn&#39;t just do this once, though. It creates the StreamTask once for each partition in your Samza job. If your Samza job has ten partitions, there will be ten instantiations of your StreamTask: one for each partition. The StreamTask instance for partition one will receive all messages for partition one, the instance for partition two will receive all messages for partition two, and so on.</p>
@@ -97,7 +99,7 @@
 
 <p>If a Samza job has more than one input stream, then the number of partitions for the Samza job will be the maximum number of partitions across all input streams. For example, if a Samza job is reading from PageView event, which has 12 partitions, and ServiceMetricEvent, which has 14 partitions, then the Samza job would have 14 partitions (0 through 13).</p>
 
-<p>When the TaskRunner&#39;s StreamConsumer threads are reading messages from each input stream partition, the messages that it receives are tagged with the partition number that it came from. Each message is fed to the StreamTask instance that corresponds to the message&#39;s partition. This design has two important properties. When a Samza job has more than one input stream, and those streams have an imbalanced number of partitions (e.g. one has 12 partitions and the other has 14), then some of your StreamTask instances will not receive messages from all streams. In the PageViewEvent/ServiceMetricEvent example, the last two StreamTask instances would only receive messages from the ServiceMetricEvent topic (partitions 12 and 13). The lower 12 instances would receive messages from both streams. If your Samza job is reading more than one input stream, you probably want all input streams to have the same number of partitions, especially if you&#39;re trying to join streams together. T
 he second important property is that Samza assumes that a stream&#39;s partition count will never change. No partition splitting is supported. If an input stream has N partitions, it is expected that it has had, and will always have N partitions. If you want to re-partition, you must read messages from the stream, and write them out to a new stream that has the number of partitions that you want. For example you could read messages from PageViewEvent, and write them to PageViewEventRepartition, which could have 14 partitions. If you did this, then you would achieve balance between PageViewEventRepartition and ServiceMetricEvent.</p>
+<p>When the TaskRunner&#39;s StreamConsumer threads are reading messages from each input stream partition, the messages that it receives are tagged with the partition number that it came from. Each message is fed to the StreamTask instance that corresponds to the message&#39;s partition. This design has two important properties. When a Samza job has more than one input stream, and those streams have an imbalanced number of partitions (e.g. one has 12 partitions and the other has 14), then some of your StreamTask instances will not receive messages from all streams. In the PageViewEvent/ServiceMetricEvent example, the last two StreamTask instances would only receive messages from the ServiceMetricEvent topic (partitions 12 and 13). The lower 12 instances would receive messages from both streams. If your Samza job is reading more than one input stream, you probably want all input streams to have the same number of partitions, especially if you&#39;re trying to join streams together. T
 he second important property is that Samza assumes that a stream&#39;s partition count will never change. No partition splitting is supported. If an input stream has N partitions, it is expected that it has always had, and will always have N partitions. If you want to re-partition, you must read messages from the stream, and write them out to a new stream that has the number of partitions that you want. For example you could read messages from PageViewEvent, and write them to PageViewEventRepartition, which could have 14 partitions. If you did this, then you would achieve balance between PageViewEventRepartition and ServiceMetricEvent.</p>
 
 <p>This design is important because it guarantees that any state that your StreamTask keeps in memory will be isolated on a per-partition basis. For example, if you refer back to the page-view counting job we used as an example in the <a href="../introduction/architecture.html">Architecture</a> section, we might have a Map&lt;Integer, Integer&gt; map that keeps track of page view counts per-member ID. If we were to have just one StreamTask per Samza job, for instance, then the member ID counts from different partitions would be inter-mingled into the same map. This inter-mingling would prevent us from moving partitions between processes or machines, which is something that we want to do with YARN. You can imagine a case where you started with one TaskRunner in a single YARN container. Your Samza job might be unable to keep up with only one container, so you ask for a second YARN container to put some of the StreamTask partitions. In such a case, how would we split the counts such th
 at one container gets only member ID counts for the partitions in charge of? This is effectively impossible if we&#39;ve inter-mingled the StreamTask&#39;s state together. This is why we isolate StreamTask instances on a per-partition basis: to make partition migration possible.</p>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/container/windowing.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/windowing.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/windowing.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/windowing.html Mon Sep  2 16:35:10 2013
@@ -65,15 +65,17 @@
         <div class="body">
           <h2>Windowing</h2>
 
-<p>Referring back to the, &quot;count PageViewEvent by member ID,&quot; example in the <a href="../introduction/architecture.html">Architecture</a> section, one thing that we left out was what we do with the counts. Let&#39;s say that the Samza job wants to update the member ID counts in a database once every minute. Here&#39;s how it would work. The Samza job that does the counting would keep a Map&lt;Integer, Integer&gt; in memory, which maps member IDs to page view counts. Every time a message arrives, the job would take the member ID in the PageViewEvent, and use it to increment the member ID&#39;s count in the in-memory map. Then, once a minute, the StreamTask would update the database (total<em>count += current</em>count) for every member ID in the map, and then reset the count map.</p>
+<p>Referring back to the &quot;count PageViewEvent by member ID&quot; example in the <a href="../introduction/architecture.html">Architecture</a> section, one thing that we left out was what we do with the counts. Let&#39;s say that the Samza job wants to update the member ID counts in a database once every minute. Here&#39;s how it would work. The Samza job that does the counting would keep a Map&lt;Integer, Integer&gt; in memory, which maps member IDs to page view counts. Every time a message arrives, the job would take the member ID in the PageViewEvent, and use it to increment the member ID&#39;s count in the in-memory map. Then, once a minute, the StreamTask would update the database (total<em>count += current</em>count) for every member ID in the map, and then reset the count map.</p>
 
 <p>Windowing is how we achieve this. If a StreamTask implements the WindowableTask interface, the TaskRunner will call the window() method on the task over a configured interval.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">public interface WindowableTask {
+<div class="highlight"><pre><code class="text language-text">public interface WindowableTask {
   void window(MessageCollector collector, TaskCoordinator coordinator);
 }
 </code></pre></div>
 <p>If you choose to implement the WindowableTask interface, you can use the Samza job&#39;s configuration to define how often the TaskRunner should call your window() method. In the PageViewEvent example (above), you would define it to flush every 60000 milliseconds (60 seconds).</p>
 
+<h2><a href="event-loop.html">Event Loop &raquo;</a></h2>
+
 
         </div>
         </div>

Modified: incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html Mon Sep  2 16:35:10 2013
@@ -89,17 +89,17 @@
 
 <p><img src="/img/0.7.0/learn/documentation/introduction/samza-hadoop.png" alt="diagram-medium"></p>
 
-<p>Before going in-depth on each of these three layers, it should be noted that Samza supports is not limited to these systems. Both Samza&#39;s execution and streaming layer are pluggable, and allow developers to implement alternatives if they prefer.</p>
+<p>Before going in-depth on each of these three layers, it should be noted that Samza&#39;s support is not limited to these systems. Both Samza&#39;s execution and streaming layer are pluggable, and allow developers to implement alternatives if they prefer.</p>
 
 <h3>Kafka</h3>
 
 <p><a href="http://kafka.apache.org/">Kafka</a> is a distributed pub/sub and message queueing system that provides at-least once messaging guarantees, and highly available partitions (i.e. a stream&#39;s partitions will be available, even if a machine goes down).</p>
 
-<p>In Kafka, each stream is called a &quot;topic&quot;. Each topic is partitioned up, to make things scalable. When a &quot;producer&quot; sends a message to a topic, the producer provides a key, which is used to determine which partition the message should be sent to. Kafka &quot;brokers&quot;, each of which are in charge of some partitions, receive the messages that the producer sends, and stores them on their disk in a log file. Kafka &quot;consumers&quot; can then read from a topic by getting messages from all of a topic&#39;s partitions.</p>
+<p>In Kafka, each stream is called a &quot;topic&quot;. Each topic is partitioned, to make things scalable. When a &quot;producer&quot; sends a message to a topic, the producer provides a key, which is used to determine which partition the message should be sent to. Kafka &quot;brokers&quot;, each of which are in charge of some partitions, receive and store the messages that the producer sends. Kafka &quot;consumers&quot; can then read from a topic by getting messages from all of a topic&#39;s partitions.</p>
 
-<p>This has some interesting properties. First, all messages partitioned by the same key are guaranteed to be in the same Kafka topic partition. This means, if you wish to read all messages for a specific member ID, you only have to read the messages from the partition that the member ID is on, not the whole topic (assuming the topic is partitioned by member ID). Second, since a Kafka broker&#39;s file is a log, you can reference any point in the log file using an &quot;offset&quot;. This offset determines where a consumer is in a topic/partition pair. After every message a consumer reads from a topic/partition pair, the offset is incremented.</p>
+<p>This has some interesting properties. First, all messages partitioned by the same key are guaranteed to be in the same Kafka topic partition. This means, if you wish to read all messages for a specific member ID, you only have to read the messages from the partition that the member ID is on, not the whole topic (assuming the topic is partitioned by member ID). Second, since a Kafka broker&#39;s log is a file, you can reference any point in the log file using an &quot;offset&quot;. This offset determines where a consumer is in a topic/partition pair. After every message a consumer reads from a topic/partition pair, the offset is incremented.</p>
 
-<p>For more details on Kafka, see Kafka&#39;s <a href="http://kafka.apache.org/introduction.html">introduction</a> and <a href="http://kafka.apache.org/design.html">design</a> pages.</p>
+<p>For more details on Kafka, see Kafka&#39;s <a href="http://kafka.apache.org/documentation.html">documentation</a> pages.</p>
 
 <h3>YARN</h3>
 
@@ -111,7 +111,7 @@
 <li><strong>Application</strong>: I want to run command X on two machines with 512M memory</li>
 <li><strong>YARN</strong>: Cool, where&#39;s your code?</li>
 <li><strong>Application</strong>: http://path.to.host/jobs/download/my.tgz</li>
-<li><strong>YARN</strong>: I&#39;m running your job on node-1.grid and node-1.grid</li>
+<li><strong>YARN</strong>: I&#39;m running your job on node-1.grid and node-2.grid</li>
 </ol>
 
 <p>Samza uses YARN to manage:</p>
@@ -129,7 +129,7 @@
 
 <h4>YARN Architecture</h4>
 
-<p>YARN has three important pieces: a ResourceManager, a NodeManager, and an ApplicationMaster. In a YARN grid, every computer runs a NodeManager, which is responsible for running processes on the local machine. A ResourceManager talks to all of the NodeManagers to tell it what to run. Applications, in turn, talk to the ResourceManager when they wish to run something on the cluster. The flow, when starting a new application, goes from user application to YARN RM, to YARN NM. The third piece, the ApplicationMaster, is actually application-specific code that runs in the YARN cluster. It&#39;s responsible for managing the application&#39;s workload, asking for containers (usually, UNIX processes), and handling notifications when one of its containers fails.</p>
+<p>YARN has three important pieces: a ResourceManager, a NodeManager, and an ApplicationMaster. In a YARN grid, every computer runs a NodeManager, which is responsible for running processes on the local machine. A ResourceManager talks to all of the NodeManagers to tell them what to run. Applications, in turn, talk to the ResourceManager when they wish to run something on the cluster. The flow, when starting a new application, goes from user application to YARN RM, to YARN NM. The third piece, the ApplicationMaster, is actually application-specific code that runs in the YARN cluster. It&#39;s responsible for managing the application&#39;s workload, asking for containers (usually UNIX processes), and handling notifications when one of its containers fails.</p>
 
 <h4>Samza and YARN</h4>
 
@@ -137,7 +137,7 @@
 
 <p><img src="/img/0.7.0/learn/documentation/introduction/samza-yarn-integration.png" alt="diagram-small"></p>
 
-<p>The Samza client talks to the YARN RM when it wants to start a new Samza job. The YARN RM talks to a YARN NM to allocate space on the cluster for Samza&#39;s ApplicationMaster. Once the NM allocates space, it starts the Samza AM. After the Samza AM starts, it asks the YARN RM for one, or more, YARN containers to run Samza <a href="../container/task-runner.html">TaskRunners</a>. Again, the RM works with NMs to allocate space for the containers. Once the space has been allocated, the NMs start the Samza containers.</p>
+<p>The Samza client talks to the YARN RM when it wants to start a new Samza job. The YARN RM talks to a YARN NM to allocate space on the cluster for Samza&#39;s ApplicationMaster. Once the NM allocates space, it starts the Samza AM. After the Samza AM starts, it asks the YARN RM for one or more YARN containers to run Samza <a href="../container/task-runner.html">TaskRunners</a>. Again, the RM works with NMs to allocate space for the containers. Once the space has been allocated, the NMs start the Samza containers.</p>
 
 <h3>Samza</h3>
 
@@ -145,7 +145,7 @@
 
 <p><img src="/img/0.7.0/learn/documentation/introduction/samza-yarn-kafka-integration.png" alt="diagram-small"></p>
 
-<p>The Samza client uses YARN to run a Samza job. The Samza <a href="../container/task-runner.html">TaskRunners</a> run in one, or more, YARN containers, and execute user-written Samza <a href="../api/overview.html">StreamTasks</a>. The input and output for the Samza StreamTasks come from Kafka brokers that are (usually) co-located on the same machines as the YARN NMs.</p>
+<p>The Samza client uses YARN to run a Samza job. The Samza <a href="../container/task-runner.html">TaskRunners</a> run in one or more YARN containers, and execute user-written Samza <a href="../api/overview.html">StreamTasks</a>. The input and output for the Samza StreamTasks come from Kafka brokers that are (usually) co-located on the same machines as the YARN NMs.</p>
 
 <h3>Example</h3>
 
@@ -155,7 +155,7 @@
 
 <p>The input topic is partitioned using Kafka. Each Samza process reads messages from one or more of the input topic&#39;s partitions, and emits them back out to a different Kafka topic. Each output message is keyed by the message&#39;s member ID attribute, and this key is mapped to one of the topic&#39;s partitions (usually by hashing the key, and modding by the number of partitions in the topic). The Kafka brokers receive these messages, and buffer them on disk until the second job (the counting job on the bottom of the diagram) reads the messages, and increments its counters.</p>
 
-<p>There are some neat things to consider about this example. First, we&#39;re leveraging the fact that Kafka topics are inherently partitioned. This lets us run one or more Samza processes, and assign them each some partitions to read from. Second, since we&#39;re guaranteed that, for a given key, all messages will be on the same partition, we can actually split up the aggregation (counting). For example, if the first job&#39;s output had four partitions, we could assign two partitions to the first count process, and the other two partitions to the second count process. We&#39;d be guaranteed that for any give member ID, all of their messages will be consumed by either the first process or the second, but not both. This means we&#39;ll get accurate counts, even when partitioning. Third, the fact that we&#39;re using Kafka, which buffers messages on its brokers, also means that we don&#39;t have to worry as much about failures. If a process or machine fails, we can use YARN to start
  the process on another machine. When the process starts up again, it can get its last offset, and resume reading messages where it left off.</p>
+<p>There are some neat things to consider about this example. First, we&#39;re leveraging the fact that Kafka topics are inherently partitioned. This lets us run one or more Samza processes, and assign them each some partitions to read from. Second, since we&#39;re guaranteed that for a given key all messages will be on the same partition, we can actually split up the aggregation (counting). For example, if the first job&#39;s output had four partitions, we could assign two partitions to the first count process, and the other two partitions to the second count process. We&#39;d be guaranteed that for any give member ID, all of their messages will be consumed by either the first process or the second, but not both. This means we&#39;ll get accurate counts, even when partitioning. Third, the fact that we&#39;re using Kafka, which buffers messages on its brokers, also means that we don&#39;t have to worry as much about failures. If a process or machine fails, we can use YARN to start t
 he process on another machine. When the process starts up again, it can get its last offset and resume reading messages where it left off.</p>
 
 <h2><a href="../comparisons/introduction.html">Comparison Introduction &raquo;</a></h2>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/introduction/background.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/introduction/background.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/introduction/background.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/introduction/background.html Mon Sep  2 16:35:10 2013
@@ -96,7 +96,7 @@
 <p>Samza is a stream processing framework with the following features:</p>
 
 <ul>
-<li><strong>Simpe API:</strong> Samza provides a very simple call-back based &quot;process message&quot; API.</li>
+<li><strong>Simple API:</strong> Samza provides a very simple call-back based &quot;process message&quot; API.</li>
 <li><strong>Managed state:</strong> Samza manages snapshotting and restoration of a stream processor&#39;s state. Samza will restore a stream processor&#39;s state to a snapshot consistent with the processor&#39;s last read messages when the processor is restarted. Samza is built to handle large amounts of state (even many gigabytes per partition).</li>
 <li><strong>Fault tolerance:</strong> Samza will work with YARN to transparently migrate your tasks whenever a machine in the cluster fails.</li>
 <li><strong>Durability:</strong> Samza uses Kafka to guarantee that no messages will ever be lost.</li>

Modified: incubator/samza/site/learn/documentation/0.7.0/introduction/concepts.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/introduction/concepts.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/introduction/concepts.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/introduction/concepts.html Mon Sep  2 16:35:10 2013
@@ -97,9 +97,9 @@
 
 <p>The task processes messages from each of its input partitions <em>in order by offset</em>. There is no defined ordering between partitions.</p>
 
-<p>The position of the task in its input partitions can be represented by set of offsets, one for each partition.</p>
+<p>The position of the task in its input partitions can be represented by a set of offsets, one for each partition.</p>
 
-<p>The number of tasks a job has is fixed and does not change (though the computational resources assigned to the job may go up and down). The number of tasks a job has also determines the maximum parallelism of the job as each task processes messages sequentially. There cannot be more tasks than input partitions (or there would be some task with no input).</p>
+<p>The number of tasks a job has is fixed and does not change (though the computational resources assigned to the job may go up and down). The number of tasks a job has also determines the maximum parallelism of the job as each task processes messages sequentially. There cannot be more tasks than input partitions (or there would be some tasks with no input).</p>
 
 <p>The partitions assigned to a task will never change: if a task is on a machine that fails the task will be restarted elsewhere still consuming the same stream partitions.</p>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html Mon Sep  2 16:35:10 2013
@@ -66,7 +66,7 @@
           <h2>Configuration</h2>
 
 <p>All Samza jobs have a configuration file that defines the job. A very basic configuration file looks like this:</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text"># Job
+<div class="highlight"><pre><code class="text language-text"># Job
 job.factory.class=samza.job.local.LocalJobFactory
 job.name=hello-world
 

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html Mon Sep  2 16:35:10 2013
@@ -66,19 +66,19 @@
           <h2>JobRunner</h2>
 
 <p>Samza jobs are started using a script called run-job.sh.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">samza-example/target/bin/run-job.sh \
+<div class="highlight"><pre><code class="text language-text">samza-example/target/bin/run-job.sh \
   --config-factory=samza.config.factories.PropertiesConfigFactory \
   --config-path=file://$PWD/config/hello-world.properties
 </code></pre></div>
 <p>You provide two parameters to the run-job.sh script. One is the config location, and the other is a factory class that is used to read your configuration file. The run-job.sh script is actually executing a Samza class called JobRunner. The JobRunner uses your ConfigFactory to get a Config object from the config path.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">public interface ConfigFactory {
+<div class="highlight"><pre><code class="text language-text">public interface ConfigFactory {
   Config getConfig(URI configUri);
 }
 </code></pre></div>
 <p>The Config object is just a wrapper around Map<String, String>, with some nice helper methods. Out of the box, Samza ships with the PropertiesConfigFactory, but developers can implement any kind of ConfigFactory they wish.</p>
 
 <p>Once the JobRunner gets your configuration, it gives your configuration to the StreamJobFactory class defined by the &quot;job.factory&quot; property. Samza ships with two job factory implementations: LocalJobFactory and YarnJobFactory. The StreamJobFactory&#39;s responsibility is to give the JobRunner a job that it can run.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">public interface StreamJob {
+<div class="highlight"><pre><code class="text language-text">public interface StreamJob {
   StreamJob submit();
 
   StreamJob kill();

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html Mon Sep  2 16:35:10 2013
@@ -70,7 +70,7 @@
 <h3>Log4j</h3>
 
 <p>The <a href="/startup/hello-samza/0.7.0">hello-samza</a> project shows how to use <a href="http://logging.apache.org/log4j/1.2/">log4j</a> with Samza. To turn on log4j logging, you just need to make sure slf4j-log4j12 is in your Samza TaskRunner&#39;s classpath. In Maven, this can be done by adding the following dependency to your Samza package project.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">&lt;dependency&gt;
+<div class="highlight"><pre><code class="text language-text">&lt;dependency&gt;
   &lt;groupId&gt;org.slf4j&lt;/groupId&gt;
   &lt;artifactId&gt;slf4j-log4j12&lt;/artifactId&gt;
   &lt;scope&gt;runtime&lt;/scope&gt;
@@ -82,7 +82,7 @@
 <h4>log4j.xml</h4>
 
 <p>Samza&#39;s <a href="packaging.html">run-class.sh</a> script will automatically set the following setting if log4j.xml exists in your <a href="packaging.html">Samza package&#39;s</a> lib directory.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">-Dlog4j.configuration=file:$base_dir/lib/log4j.xml
+<div class="highlight"><pre><code class="text language-text">-Dlog4j.configuration=file:$base_dir/lib/log4j.xml
 </code></pre></div>
 <!-- TODO add notes showing how to use task.opts for gc logging
 #### task.opts
@@ -95,7 +95,7 @@
 <h3>Garbage Collection Logging</h3>
 
 <p>Samza&#39;s will automatically set the following garbage collection logging setting, and will output it to <em>$SAMZA</em>_<em>LOG</em>_<em>DIR</em>/gc.log.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">-XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log
+<div class="highlight"><pre><code class="text language-text">-XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log
 </code></pre></div>
 <h4>Rotation</h4>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/packaging.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/packaging.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/packaging.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/packaging.html Mon Sep  2 16:35:10 2013
@@ -66,13 +66,13 @@
           <h2>Packaging</h2>
 
 <p>The <a href="job-runner.html">JobRunner</a> page talks about run-job.sh, and how it&#39;s used to start a job either locally (LocalJobFactory) or with YARN (YarnJobFactory). In the diagram that shows the execution flow, it also shows a run-task.sh script. This script, along with a run-am.sh script, are what Samza actually calls to execute its code.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">bin/run-am.sh
+<div class="highlight"><pre><code class="text language-text">bin/run-am.sh
 bin/run-task.sh
 </code></pre></div>
 <p>The run-task.sh script is responsible for starting the TaskRunner. The run-am.sh script is responsible for starting Samza&#39;s application master for YARN. Thus, the run-am.sh script is only used by the YarnJob, but both YarnJob and ProcessJob use run-task.sh.</p>
 
 <p>Typically, these two scripts are bundled into a tar.gz file that has a structure like this:</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">bin/run-am.sh
+<div class="highlight"><pre><code class="text language-text">bin/run-am.sh
 bin/run-class.sh
 bin/run-job.sh
 bin/run-task.sh

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html Mon Sep  2 16:35:10 2013
@@ -68,7 +68,7 @@
 <p>When you define job.factory.class=samza.job.yarn.YarnJobFactory in your job&#39;s configuration, Samza will use YARN to execute your job. The YarnJobFactory will use the YARN_HOME environment variable on the machine that run-job.sh is executed on to get the appropriate YARN configuration, which will define where the YARN resource manager is. The YarnJob will work with the resource manager to get your job started on the YARN cluster.</p>
 
 <p>If you want to use YARN to run your Samza job, you&#39;ll also need to define the location of your Samza job&#39;s package. For example, you might say:</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">yarn.package.path=http://my.http.server/jobs/ingraphs-package-0.0.55.tgz
+<div class="highlight"><pre><code class="text language-text">yarn.package.path=http://my.http.server/jobs/ingraphs-package-0.0.55.tgz
 </code></pre></div>
 <p>This .tgz file follows the conventions outlined on the <a href="packaging.html">Packaging</a> page (it has bin/run-am.sh and bin/run-task.sh). YARN NodeManagers will take responsibility for downloading this .tgz file on the appropriate machines, and untar&#39;ing them. From there, YARN will execute run-am.sh or run-task.sh for the Samza Application Master, and TaskRunner, respectively.</p>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html Mon Sep  2 16:35:10 2013
@@ -74,7 +74,7 @@
 <h3>Auto-Create Topics</h3>
 
 <p>Kafka brokers should be configured to automatically create topics. Without this, it&#39;s going to be very cumbersome to run Samze jobs, since jobs will write to arbitrary (and sometimes new) topics.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">auto.create.topics.enable=true
+<div class="highlight"><pre><code class="text language-text">auto.create.topics.enable=true
 </code></pre></div>
 
         </div>

Modified: incubator/samza/site/sitemap.xml
URL: http://svn.apache.org/viewvc/incubator/samza/site/sitemap.xml?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/sitemap.xml (original)
+++ incubator/samza/site/sitemap.xml Mon Sep  2 16:35:10 2013
@@ -4,7 +4,7 @@
 
   <url>
     <loc>http://samza.incubator.apache.org/</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     <changefreq>daily</changefreq>
     <priority>1.0</priority>
   </url>
@@ -14,273 +14,273 @@
   
   <url>
     <loc>http://samza.incubator.apache.org/community/committers.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/community/irc.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/community/mailing-lists.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/code.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/coding-guide.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/disclaimer.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/projects.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/rules.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/seps.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/api/overview.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/introduction.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/mupd8.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/storm.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/checkpointing.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/event-loop.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/jmx.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/metrics.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/state-management.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/streams.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/task-runner.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/windowing.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/architecture.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/background.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/concepts.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/configuration.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/job-runner.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/logging.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/packaging.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/yarn-jobs.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/operations/kafka.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/operations/security.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/yarn/application-master.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/yarn/isolation.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/learn/tutorials/0.7.0/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/startup/download/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/startup/hello-samza/0.7.0/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>

Modified: incubator/samza/site/startup/download/index.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/startup/download/index.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/startup/download/index.html (original)
+++ incubator/samza/site/startup/download/index.html Mon Sep  2 16:35:10 2013
@@ -129,7 +129,7 @@ Snapshot builds are available in the Apa
 <h3>Checking out and Building</h3>
 
 <p>If you&#39;re interested in working on Samza, or building the JARs from scratch, then you&#39;ll need to checkout and build the code. Samza does not have a binary release at this time. To check out and build Samza, run these commands.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">git clone http://git-wip-us.apache.org/repos/asf/incubator-samza.git
+<div class="highlight"><pre><code class="text language-text">git clone http://git-wip-us.apache.org/repos/asf/incubator-samza.git
 cd incubator-samza
 ./gradlew clean build
 </code></pre></div>

Modified: incubator/samza/site/startup/hello-samza/0.7.0/index.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/startup/hello-samza/0.7.0/index.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/startup/hello-samza/0.7.0/index.html (original)
+++ incubator/samza/site/startup/hello-samza/0.7.0/index.html Mon Sep  2 16:35:10 2013
@@ -72,19 +72,19 @@
 <h3>Get the Code</h3>
 
 <p>You&#39;ll need to check out and publish Samza, since it&#39;s not available in a Maven repository right now.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">git clone http://git-wip-us.apache.org/repos/asf/incubator-samza.git
+<div class="highlight"><pre><code class="text language-text">git clone http://git-wip-us.apache.org/repos/asf/incubator-samza.git
 cd incubator-samza
 ./gradlew -PscalaVersion=2.8.1 clean publishToMavenLocal
 </code></pre></div>
 <p>Next, check out the hello-samza project.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">git clone git://github.com/linkedin/hello-samza.git
+<div class="highlight"><pre><code class="text language-text">git clone git://github.com/linkedin/hello-samza.git
 </code></pre></div>
 <p>This project contains everything you&#39;ll need to run your first Samza jobs.</p>
 
 <h3>Start a Grid</h3>
 
 <p>A Samza grid usually comprises three different systems: <a href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html">YARN</a>, <a href="http://kafka.apache.org/">Kafka</a>, and <a href="http://zookeeper.apache.org/">ZooKeeper</a>. The hello-samza project comes with a script called &quot;grid&quot; to help you setup these systems. Start by running:</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">bin/grid
+<div class="highlight"><pre><code class="text language-text">bin/grid
 </code></pre></div>
 <p>This command will download, install, and start ZooKeeper, Kafka, and YARN. All package files will be put in a sub-directory called &quot;deploy&quot; inside hello-samza&#39;s root folder.</p>
 
@@ -93,34 +93,34 @@ cd incubator-samza
 <h3>Build a Samza Job Package</h3>
 
 <p>Before you can run a Samza job, you need to build a package for it. This package is what YARN uses to deploy your jobs on the grid.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">mvn clean package
+<div class="highlight"><pre><code class="text language-text">mvn clean package
 mkdir -p deploy/samza
 tar -xvf ./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz -C deploy/samza
 </code></pre></div>
 <h3>Run a Samza Job</h3>
 
 <p>After you&#39;ve built your Samza package, you can start a job on the grid using the run-job.sh script.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
+<div class="highlight"><pre><code class="text language-text">deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
 </code></pre></div>
 <p>The job will consume a feed of real-time edits from Wikipedia, and produce them to a Kafka topic called &quot;wikipedia-raw&quot;. Give the job a minute to startup, and then tail the Kafka topic:</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-raw
+<div class="highlight"><pre><code class="text language-text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-raw
 </code></pre></div>
 <p>Pretty neat, right? Now, check out the YARN UI again (<a href="http://localhost:8088">http://localhost:8088</a>). This time around, you&#39;ll see your Samza job is running!</p>
 
 <h3>Generate Wikipedia Statistics</h3>
 
 <p>Let&#39;s calculate some statistics based on the messages in the wikipedia-raw topic. Start two more jobs:</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-parser.properties
+<div class="highlight"><pre><code class="text language-text">deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-parser.properties
 deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-stats.properties
 </code></pre></div>
 <p>The first job (wikipedia-parser) parses the messages in wikipedia-raw, and extracts information about the size of the edit, who made the change, etc. You can take a look at its output with:</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-edits
+<div class="highlight"><pre><code class="text language-text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-edits
 </code></pre></div>
 <p>The last job (wikipedia-stats) reads messages from the wikipedia-edits topic, and calculates counts, every ten seconds, for all edits that were made during that window. It outputs these counts to the wikipedia-stats topic.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-stats
+<div class="highlight"><pre><code class="text language-text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic wikipedia-stats
 </code></pre></div>
 <p>The messages in the stats topic look like this:</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">{&quot;is-talk&quot;:2,&quot;bytes-added&quot;:5276,&quot;edits&quot;:13,&quot;unique-titles&quot;:13}
+<div class="highlight"><pre><code class="text language-text">{&quot;is-talk&quot;:2,&quot;bytes-added&quot;:5276,&quot;edits&quot;:13,&quot;unique-titles&quot;:13}
 {&quot;is-bot-edit&quot;:1,&quot;is-talk&quot;:3,&quot;bytes-added&quot;:4211,&quot;edits&quot;:30,&quot;unique-titles&quot;:30,&quot;is-unpatrolled&quot;:1,&quot;is-new&quot;:2,&quot;is-minor&quot;:7}
 {&quot;bytes-added&quot;:3180,&quot;edits&quot;:19,&quot;unique-titles&quot;:19,&quot;is-unpatrolled&quot;:1,&quot;is-new&quot;:1,&quot;is-minor&quot;:3}
 {&quot;bytes-added&quot;:2218,&quot;edits&quot;:18,&quot;unique-titles&quot;:18,&quot;is-unpatrolled&quot;:2,&quot;is-new&quot;:2,&quot;is-minor&quot;:3}
@@ -130,7 +130,7 @@ deploy/samza/bin/run-job.sh --config-fac
 <h3>Shutdown</h3>
 
 <p>After you&#39;re done, you can clean everything up using the same grid script.</p>
-<div class="highlight"><pre><code class="text language-text" data-lang="text">bin/grid stop yarn
+<div class="highlight"><pre><code class="text language-text">bin/grid stop yarn
 bin/grid stop kafka
 bin/grid stop zookeeper
 </code></pre></div>