You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by aj...@apache.org on 2023/01/18 19:33:31 UTC

svn commit: r1906774 [30/49] - in /samza/site: ./ archive/ blog/ case-studies/ community/ contribute/ img/latest/learn/documentation/api/ learn/documentation/latest/ learn/documentation/latest/api/ learn/documentation/latest/api/javadocs/ learn/documen...

Modified: samza/site/learn/documentation/latest/connectors/kinesis.html
URL: http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/connectors/kinesis.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/connectors/kinesis.html (original)
+++ samza/site/learn/documentation/latest/connectors/kinesis.html Wed Jan 18 19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a href="/learn/documentation/1.8.0/connectors/kinesis">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a href="/learn/documentation/1.7.0/connectors/kinesis">1.7.0</a></li>
+
+              
+
               <li class="hide"><a href="/learn/documentation/1.6.0/connectors/kinesis">1.6.0</a></li>
 
               
@@ -639,15 +653,14 @@
    limitations under the License.
 -->
 
-<h3 id="kinesis-i-o-quickstart">Kinesis I/O: Quickstart</h3>
+<h3 id="kinesis-io-quickstart">Kinesis I/O: Quickstart</h3>
 
 <p>The Samza Kinesis connector allows you to interact with <a href="https://aws.amazon.com/kinesis/data-streams">Amazon Kinesis Data Streams</a>,
-Amazon’s data streaming service. The <code>hello-samza</code> project includes an example of processing Kinesis streams using Samza. Here is the complete <a href="https://github.com/apache/samza-hello-samza/blob/master/src/main/java/samza/examples/kinesis/KinesisHelloSamza.java">source code</a> and <a href="https://github.com/apache/samza-hello-samza/blob/master/src/main/config/kinesis-hello-samza.properties">configs</a>.
+Amazon’s data streaming service. The <code class="language-plaintext highlighter-rouge">hello-samza</code> project includes an example of processing Kinesis streams using Samza. Here is the complete <a href="https://github.com/apache/samza-hello-samza/blob/master/src/main/java/samza/examples/kinesis/KinesisHelloSamza.java">source code</a> and <a href="https://github.com/apache/samza-hello-samza/blob/master/src/main/config/kinesis-hello-samza.properties">configs</a>.
 You can build and run this example using this <a href="https://github.com/apache/samza-hello-samza#hello-samza">tutorial</a>.</p>
 
-<h3 id="data-format">Data Format</h3>
-
-<p>Like a Kafka topic, a Kinesis stream can have multiple shards with producers and consumers.
+<p>###Data Format
+Like a Kafka topic, a Kinesis stream can have multiple shards with producers and consumers.
 Each message consumed from the stream is an instance of a Kinesis <a href="http://docs.aws.amazon.com/goto/WebAPI/kinesis-2013-12-02/Record">Record</a>.
 Samza’s <a href="https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisSystemConsumer.java">KinesisSystemConsumer</a>
 wraps the Record into a <a href="https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisIncomingMessageEnvelope.java">KinesisIncomingMessageEnvelope</a>.</p>
@@ -656,121 +669,118 @@ wraps the Record into a <a href="https:/
 
 <h4 id="basic-configuration">Basic Configuration</h4>
 
-<p>Here is the required configuration for consuming messages from Kinesis, through <code>KinesisSystemDescriptor</code> and <code>KinesisInputDescriptor</code>. </p>
+<p>Here is the required configuration for consuming messages from Kinesis, through <code class="language-plaintext highlighter-rouge">KinesisSystemDescriptor</code> and <code class="language-plaintext highlighter-rouge">KinesisInputDescriptor</code>.</p>
 
-<figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">KinesisSystemDescriptor</span> <span class="n">ksd</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KinesisSystemDescriptor</span><span class="o">(</span><span class="s">&quot;kinesis&quot;</span><span class="o">);</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">KinesisSystemDescriptor</span> <span class="n">ksd</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">KinesisSystemDescriptor</span><span class="o">(</span><span class="s">"kinesis"</span><span class="o">);</span>
     
-<span class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> 
-    <span class="n">ksd</span><span class="o">.</span><span class="na">getInputDescriptor</span><span class="o">(</span><span class="s">&quot;STREAM-NAME&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="n">NoOpSerde</span><span class="o">&lt;</span><span class="kt">byte</span><span class="o">[]&gt;())</span>
-          <span class="o">.</span><span class="na">withRegion</span><span class="o">(</span><span class="s">&quot;STREAM-REGION&quot;</span><span class="o">)</span>
-          <span class="o">.</span><span class="na">withAccessKey</span><span class="o">(</span><span class="s">&quot;YOUR-ACCESS_KEY&quot;</span><span class="o">)</span>
-          <span class="o">.</span><span class="na">withSecretKey</span><span class="o">(</span><span class="s">&quot;YOUR-SECRET-KEY&quot;</span><span class="o">);</span></code></pre></figure>
-
-<h4 id="coordination">Coordination</h4>
+<span class="nc">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> 
+    <span class="n">ksd</span><span class="o">.</span><span class="na">getInputDescriptor</span><span class="o">(</span><span class="s">"STREAM-NAME"</span><span class="o">,</span> <span class="k">new</span> <span class="nc">NoOpSerde</span><span class="o">&lt;</span><span class="kt">byte</span><span class="o">[]&gt;())</span>
+          <span class="o">.</span><span class="na">withRegion</span><span class="o">(</span><span class="s">"STREAM-REGION"</span><span class="o">)</span>
+          <span class="o">.</span><span class="na">withAccessKey</span><span class="o">(</span><span class="s">"YOUR-ACCESS_KEY"</span><span class="o">)</span>
+          <span class="o">.</span><span class="na">withSecretKey</span><span class="o">(</span><span class="s">"YOUR-SECRET-KEY"</span><span class="o">);</span></code></pre></figure>
 
-<p>The Kinesis system consumer does not rely on Samza&rsquo;s coordination mechanism. Instead, it uses the Kinesis client library (KCL) for coordination and distributing available shards among available instances. Hence, you should
-set your <code>grouper</code> configuration to <code>AllSspToSingleTaskGrouperFactory</code>.</p>
+<p>####Coordination
+The Kinesis system consumer does not rely on Samza’s coordination mechanism. Instead, it uses the Kinesis client library (KCL) for coordination and distributing available shards among available instances. Hence, you should
+set your <code class="language-plaintext highlighter-rouge">grouper</code> configuration to <code class="language-plaintext highlighter-rouge">AllSspToSingleTaskGrouperFactory</code>.</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"><span></span><span class="na">job.systemstreampartition.grouper.factory</span><span class="o">=</span><span class="s">org.apache.samza.container.grouper.stream.AllSspToSingleTaskGrouperFactory</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties">job.systemstreampartition.grouper.factory=org.apache.samza.container.grouper.stream.AllSspToSingleTaskGrouperFactory</code></pre></figure>
 
-<h4 id="security">Security</h4>
+<p>####Security</p>
 
-<p>Each Kinesis stream in a given AWS <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html">region</a> can be accessed by providing an <a href="https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys">access key</a>. An Access key consists of two parts: an access key ID (for example, <code>AKIAIOSFODNN7EXAMPLE</code>) and a secret access key (for example, <code>wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY</code>) which you can use to send programmatic requests to AWS. </p>
+<p>Each Kinesis stream in a given AWS <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html">region</a> can be accessed by providing an <a href="https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys">access key</a>. An Access key consists of two parts: an access key ID (for example, <code class="language-plaintext highlighter-rouge">AKIAIOSFODNN7EXAMPLE</code>) and a secret access key (for example, <code class="language-plaintext highlighter-rouge">wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY</code>) which you can use to send programmatic requests to AWS.</p>
 
-<figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> 
-    <span class="n">ksd</span><span class="o">.</span><span class="na">getInputDescriptor</span><span class="o">(</span><span class="s">&quot;STREAM-NAME&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="n">NoOpSerde</span><span class="o">&lt;</span><span class="kt">byte</span><span class="o">[]&gt;())</span>
-          <span class="o">.</span><span class="na">withRegion</span><span class="o">(</span><span class="s">&quot;STREAM-REGION&quot;</span><span class="o">)</span>
-          <span class="o">.</span><span class="na">withAccessKey</span><span class="o">(</span><span class="s">&quot;YOUR-ACCESS_KEY&quot;</span><span class="o">)</span>
-          <span class="o">.</span><span class="na">withSecretKey</span><span class="o">(</span><span class="s">&quot;YOUR-SECRET-KEY&quot;</span><span class="o">);</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> 
+    <span class="n">ksd</span><span class="o">.</span><span class="na">getInputDescriptor</span><span class="o">(</span><span class="s">"STREAM-NAME"</span><span class="o">,</span> <span class="k">new</span> <span class="nc">NoOpSerde</span><span class="o">&lt;</span><span class="kt">byte</span><span class="o">[]&gt;())</span>
+          <span class="o">.</span><span class="na">withRegion</span><span class="o">(</span><span class="s">"STREAM-REGION"</span><span class="o">)</span>
+          <span class="o">.</span><span class="na">withAccessKey</span><span class="o">(</span><span class="s">"YOUR-ACCESS_KEY"</span><span class="o">)</span>
+          <span class="o">.</span><span class="na">withSecretKey</span><span class="o">(</span><span class="s">"YOUR-SECRET-KEY"</span><span class="o">);</span></code></pre></figure>
 
 <h3 id="advanced-configuration">Advanced Configuration</h3>
 
 <h4 id="kinesis-client-library-configs">Kinesis Client Library Configs</h4>
-
 <p>Samza Kinesis Connector uses the <a href="https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-kcl.html#kinesis-record-processor-overview-kcl">Kinesis Client Library</a>
 (KCL) to access the Kinesis data streams. You can set any <a href="https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/software/amazon/kinesis/coordinator/KinesisClientLibConfiguration.java">KCL Configuration</a>
-for a stream by configuring it through <code>KinesisInputDescriptor</code>.</p>
+for a stream by configuring it through <code class="language-plaintext highlighter-rouge">KinesisInputDescriptor</code>.</p>
 
-<figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
 
-<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;&gt;;</span>
-<span class="n">kclConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;CONFIG-PARAM&quot;</span><span class="o">,</span> <span class="s">&quot;CONFIG-VALUE&quot;</span><span class="o">);</span>
+<span class="nc">Map</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">String</span><span class="o">&gt;</span> <span class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">HashMap</span><span class="o">&lt;&gt;;</span>
+<span class="n">kclConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"CONFIG-PARAM"</span><span class="o">,</span> <span class="s">"CONFIG-VALUE"</span><span class="o">);</span>
 
 <span class="n">kid</span><span class="o">.</span><span class="na">withKCLConfig</span><span class="o">(</span><span class="n">kclConfig</span><span class="o">);</span></code></pre></figure>
 
-<p>As an example, the below configuration is equivalent to invoking <code>kclClient#WithTableName(myTable)</code> on the KCL instance.</p>
+<p>As an example, the below configuration is equivalent to invoking <code class="language-plaintext highlighter-rouge">kclClient#WithTableName(myTable)</code> on the KCL instance.</p>
 
-<figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
 
-<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;&gt;;</span>
-<span class="n">kclConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;TableName&quot;</span><span class="o">,</span> <span class="s">&quot;myTable&quot;</span><span class="o">);</span>
+<span class="nc">Map</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">String</span><span class="o">&gt;</span> <span class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">HashMap</span><span class="o">&lt;&gt;;</span>
+<span class="n">kclConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"TableName"</span><span class="o">,</span> <span class="s">"myTable"</span><span class="o">);</span>
 
 <span class="n">kid</span><span class="o">.</span><span class="na">withKCLConfig</span><span class="o">(</span><span class="n">kclConfig</span><span class="o">);</span></code></pre></figure>
 
 <h4 id="aws-client-configs">AWS Client configs</h4>
-
 <p>Samza allows you to specify any <a href="http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html">AWS client configs</a> to connect to your Kinesis instance.
-You can configure any <a href="http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html">AWS client configuration</a> through <code>KinesisSystemDescriptor</code>.</p>
+You can configure any <a href="http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html">AWS client configuration</a> through <code class="language-plaintext highlighter-rouge">KinesisSystemDescriptor</code>.</p>
 
-<figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">awsConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;&gt;;</span>
-<span class="n">awsConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;CONFIG-PARAM&quot;</span><span class="o">,</span> <span class="s">&quot;CONFIG-VALUE&quot;</span><span class="o">);</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">Map</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">String</span><span class="o">&gt;</span> <span class="n">awsConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">HashMap</span><span class="o">&lt;&gt;;</span>
+<span class="n">awsConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"CONFIG-PARAM"</span><span class="o">,</span> <span class="s">"CONFIG-VALUE"</span><span class="o">);</span>
 
-<span class="n">KinesisSystemDescriptor</span> <span class="n">sd</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KinesisSystemDescriptor</span><span class="o">(</span><span class="n">systemName</span><span class="o">)</span>
+<span class="nc">KinesisSystemDescriptor</span> <span class="n">sd</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">KinesisSystemDescriptor</span><span class="o">(</span><span class="n">systemName</span><span class="o">)</span>
                                           <span class="o">.</span><span class="na">withAWSConfig</span><span class="o">(</span><span class="n">awsConfig</span><span class="o">);</span></code></pre></figure>
 
-<p>Through <code>KinesisSystemDescriptor</code> you can also set the <em>proxy host</em> and <em>proxy port</em> to be used by the Kinesis Client:</p>
+<p>Through <code class="language-plaintext highlighter-rouge">KinesisSystemDescriptor</code> you can also set the <em>proxy host</em> and <em>proxy port</em> to be used by the Kinesis Client:</p>
 
-<figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">KinesisSystemDescriptor</span> <span class="n">sd</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KinesisSystemDescriptor</span><span class="o">(</span><span class="n">systemName</span><span class="o">)</span>
-                                          <span class="o">.</span><span class="na">withProxyHost</span><span class="o">(</span><span class="s">&quot;YOUR-PROXY-HOST&quot;</span><span class="o">)</span>
-                                          <span class="o">.</span><span class="na">withProxyPort</span><span class="o">(</span><span class="n">YOUR</span><span class="o">-</span><span class="n">PROXY</span><span class="o">-</span><span class="n">PORT</span><span class="o">);</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">KinesisSystemDescriptor</span> <span class="n">sd</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">KinesisSystemDescriptor</span><span class="o">(</span><span class="n">systemName</span><span class="o">)</span>
+                                          <span class="o">.</span><span class="na">withProxyHost</span><span class="o">(</span><span class="s">"YOUR-PROXY-HOST"</span><span class="o">)</span>
+                                          <span class="o">.</span><span class="na">withProxyPort</span><span class="o">(</span><span class="no">YOUR</span><span class="o">-</span><span class="no">PROXY</span><span class="o">-</span><span class="no">PORT</span><span class="o">);</span></code></pre></figure>
 
 <h3 id="resetting-offsets">Resetting Offsets</h3>
 
 <p>Unlike other connectors where Samza stores and manages checkpointed offsets, Kinesis checkpoints are stored in a <a href="https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-ddb.html">DynamoDB</a> table.
-These checkpoints are stored and managed by the KCL library internally. You can reset the checkpoints by configuring a different name for the DynamoDB table. </p>
+These checkpoints are stored and managed by the KCL library internally. You can reset the checkpoints by configuring a different name for the DynamoDB table.</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"><span></span><span class="c">// change the TableName to a unique name to reset checkpoints.</span>
-<span class="na">systems.kinesis-system.streams.STREAM-NAME.aws.kcl.TableName</span><span class="o">=</span><span class="s">my-app-table-name</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties">// change the TableName to a unique name to reset checkpoints.
+systems.kinesis-system.streams.STREAM-NAME.aws.kcl.TableName=my-app-table-name</code></pre></figure>
 
-<p>Or through <code>KinesisInputDescriptor</code></p>
+<p>Or through <code class="language-plaintext highlighter-rouge">KinesisInputDescriptor</code></p>
 
-<figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
 
-<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;&gt;;</span>
-<span class="n">kclConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;TableName&quot;</span><span class="o">,</span> <span class="s">&quot;my-new-app-table-name&quot;</span><span class="o">);</span>
+<span class="nc">Map</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">String</span><span class="o">&gt;</span> <span class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">HashMap</span><span class="o">&lt;&gt;;</span>
+<span class="n">kclConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"TableName"</span><span class="o">,</span> <span class="s">"my-new-app-table-name"</span><span class="o">);</span>
 
 <span class="n">kid</span><span class="o">.</span><span class="na">withKCLConfig</span><span class="o">(</span><span class="n">kclConfig</span><span class="o">);</span></code></pre></figure>
 
-<p>When you reset checkpoints, you can configure your job to start consuming from either the earliest or latest offset in the stream.  </p>
+<p>When you reset checkpoints, you can configure your job to start consuming from either the earliest or latest offset in the stream.</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"><span></span><span class="c">// set the starting position to either TRIM_HORIZON (oldest) or LATEST (latest)</span>
-<span class="na">systems.kinesis-system.streams.STREAM-NAME.aws.kcl.InitialPositionInStream</span><span class="o">=</span><span class="s">LATEST</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties">// set the starting position to either TRIM_HORIZON (oldest) or LATEST (latest)
+systems.kinesis-system.streams.STREAM-NAME.aws.kcl.InitialPositionInStream=LATEST</code></pre></figure>
 
-<p>Or through <code>KinesisInputDescriptor</code></p>
+<p>Or through <code class="language-plaintext highlighter-rouge">KinesisInputDescriptor</code></p>
 
-<figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">KinesisInputDescriptor</span><span class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
 
-<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;&gt;;</span>
-<span class="n">kclConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;InitialPositionInStream&quot;</span><span class="o">,</span> <span class="s">&quot;LATEST&quot;</span><span class="o">);</span>
+<span class="nc">Map</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">String</span><span class="o">&gt;</span> <span class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">HashMap</span><span class="o">&lt;&gt;;</span>
+<span class="n">kclConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"InitialPositionInStream"</span><span class="o">,</span> <span class="s">"LATEST"</span><span class="o">);</span>
 
 <span class="n">kid</span><span class="o">.</span><span class="na">withKCLConfig</span><span class="o">(</span><span class="n">kclConfig</span><span class="o">);</span></code></pre></figure>
 
 <p>Alternately, if you want to start from a particular offset in the Kinesis stream, you can login to the <a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ConsoleDynamoDB.html">AWS console</a> and edit the offsets in your DynamoDB Table.
-By default, the table-name has the following format: &ldquo;&lt;job name&gt;-&lt;job id&gt;-&lt;kinesis stream&gt;&rdquo;.</p>
+By default, the table-name has the following format: “&lt;job name&gt;-&lt;job id&gt;-&lt;kinesis stream&gt;”.</p>
 
 <h3 id="known-limitations">Known Limitations</h3>
 
 <p>The following limitations apply to Samza jobs consuming from Kinesis streams :</p>
 
 <ul>
-<li>Stateful processing (eg: windows or joins) is not supported on Kinesis streams. However, you can accomplish this by
+  <li>Stateful processing (eg: windows or joins) is not supported on Kinesis streams. However, you can accomplish this by
 chaining two Samza jobs where the first job reads from Kinesis and sends to Kafka while the second job processes the
 data from Kafka.</li>
-<li>Kinesis streams cannot be configured as <a href="https://samza.apache.org/learn/documentation/latest/container/streams.html">bootstrap</a>
+  <li>Kinesis streams cannot be configured as <a href="https://samza.apache.org/learn/documentation/latest/container/streams.html">bootstrap</a>
 or <a href="https://samza.apache.org/learn/documentation/latest/container/samza-container.html">broadcast</a> streams.</li>
-<li>Kinesis streams must be used only with the <a href="https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/container/grouper/stream/AllSspToSingleTaskGrouperFactory.java">AllSspToSingleTaskGrouperFactory</a>
+  <li>Kinesis streams must be used only with the <a href="https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/container/grouper/stream/AllSspToSingleTaskGrouperFactory.java">AllSspToSingleTaskGrouperFactory</a>
 as the Kinesis consumer does the partition management by itself. No other grouper is currently supported.</li>
-<li>A Samza job that consumes from Kinesis cannot consume from any other input source. However, you can send your results
+  <li>A Samza job that consumes from Kinesis cannot consume from any other input source. However, you can send your results
 to any destination (eg: Kafka, EventHubs), and have another Samza job consume them.</li>
 </ul>
 
@@ -778,6 +788,7 @@ to any destination (eg: Kafka, EventHubs
 
 <p>The KinesisSystemProducer for Samza is not yet implemented.</p>
 
+
            
         </div>
       </div>

Modified: samza/site/learn/documentation/latest/connectors/overview.html
URL: http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/connectors/overview.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/connectors/overview.html (original)
+++ samza/site/learn/documentation/latest/connectors/overview.html Wed Jan 18 19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a href="/learn/documentation/1.8.0/connectors/overview">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a href="/learn/documentation/1.7.0/connectors/overview">1.7.0</a></li>
+
+              
+
               <li class="hide"><a href="/learn/documentation/1.6.0/connectors/overview">1.6.0</a></li>
 
               
@@ -641,26 +655,42 @@
 
 <p>Stream processing applications often read data from external sources like Kafka or HDFS. Likewise, they require processed
 results to be written to external system or data stores. Samza is pluggable and designed to support a variety of <a href="/learn/documentation/latest/api/javadocs/org/apache/samza/system/SystemProducer.html">producers</a> and <a href="/learn/documentation/latest/api/javadocs/org/apache/samza/system/SystemConsumer.html">consumers</a> for your data. You can 
-integrate Samza with any streaming system by implementing the <a href="/learn/documentation/latest/api/javadocs/org/apache/samza/system/SystemFactory.html">SystemFactory</a> interface. </p>
+integrate Samza with any streaming system by implementing the <a href="/learn/documentation/latest/api/javadocs/org/apache/samza/system/SystemFactory.html">SystemFactory</a> interface.</p>
 
 <p>The following integrations are supported out-of-the-box:</p>
 
 <p>Consumers:</p>
 
 <ul>
-<li><p><a href="kafka">Apache Kafka</a> </p></li>
-<li><p><a href="eventhubs">Microsoft Azure Eventhubs</a> </p></li>
-<li><p><a href="kinesis">Amazon AWS Kinesis Streams</a> </p></li>
-<li><p><a href="hdfs">Hadoop Filesystem</a> </p></li>
+  <li>
+    <p><a href="kafka">Apache Kafka</a></p>
+  </li>
+  <li>
+    <p><a href="eventhubs">Microsoft Azure Eventhubs</a></p>
+  </li>
+  <li>
+    <p><a href="kinesis">Amazon AWS Kinesis Streams</a></p>
+  </li>
+  <li>
+    <p><a href="hdfs">Hadoop Filesystem</a></p>
+  </li>
 </ul>
 
 <p>Producers:</p>
 
 <ul>
-<li><p><a href="kafka">Apache Kafka</a> </p></li>
-<li><p><a href="eventhubs">Microsoft Azure Eventhubs</a> </p></li>
-<li><p><a href="hdfs">Hadoop Filesystem</a> </p></li>
-<li><p><a href="https://github.com/apache/samza/blob/master/samza-elasticsearch/src/main/java/org/apache/samza/system/elasticsearch/ElasticsearchSystemProducer.java">Elasticsearch</a></p></li>
+  <li>
+    <p><a href="kafka">Apache Kafka</a></p>
+  </li>
+  <li>
+    <p><a href="eventhubs">Microsoft Azure Eventhubs</a></p>
+  </li>
+  <li>
+    <p><a href="hdfs">Hadoop Filesystem</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/apache/samza/blob/master/samza-elasticsearch/src/main/java/org/apache/samza/system/elasticsearch/ElasticsearchSystemProducer.java">Elasticsearch</a></p>
+  </li>
 </ul>
 
            

Modified: samza/site/learn/documentation/latest/container/checkpointing.html
URL: http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/checkpointing.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/checkpointing.html (original)
+++ samza/site/learn/documentation/latest/container/checkpointing.html Wed Jan 18 19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a href="/learn/documentation/1.8.0/container/checkpointing">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a href="/learn/documentation/1.7.0/container/checkpointing">1.7.0</a></li>
+
+              
+
               <li class="hide"><a href="/learn/documentation/1.6.0/container/checkpointing">1.6.0</a></li>
 
               
@@ -639,12 +653,12 @@
    limitations under the License.
 -->
 
-<p>Samza provides fault-tolerant processing of streams: Samza guarantees that messages won&rsquo;t be lost, even if your job crashes, if a machine dies, if there is a network fault, or something else goes wrong. In order to provide this guarantee, Samza expects the <a href="streams.html">input system</a> to meet the following requirements:</p>
+<p>Samza provides fault-tolerant processing of streams: Samza guarantees that messages won’t be lost, even if your job crashes, if a machine dies, if there is a network fault, or something else goes wrong. In order to provide this guarantee, Samza expects the <a href="streams.html">input system</a> to meet the following requirements:</p>
 
 <ul>
-<li>The stream may be sharded into one or more <em>partitions</em>. Each partition is independent from the others, and is replicated across multiple machines (the stream continues to be available, even if a machine fails).</li>
-<li>Each partition consists of a sequence of messages in a fixed order. Each message has an <em>offset</em>, which indicates its position in that sequence. Messages are always consumed sequentially within each partition.</li>
-<li>A Samza job can start consuming the sequence of messages from any starting offset.</li>
+  <li>The stream may be sharded into one or more <em>partitions</em>. Each partition is independent from the others, and is replicated across multiple machines (the stream continues to be available, even if a machine fails).</li>
+  <li>Each partition consists of a sequence of messages in a fixed order. Each message has an <em>offset</em>, which indicates its position in that sequence. Messages are always consumed sequentially within each partition.</li>
+  <li>A Samza job can start consuming the sequence of messages from any starting offset.</li>
 </ul>
 
 <p>Kafka meets these requirements, but they can also be implemented with other message broker systems.</p>
@@ -653,36 +667,36 @@
 
 <p>If a Samza container fails, it needs to be restarted (potentially on another machine) and resume processing where the failed container left off. In order to enable this, a container periodically checkpoints the current offset for each task instance.</p>
 
-<p><img src="/img/latest/learn/documentation/container/checkpointing.svg" alt="Illustration of checkpointing" class="diagram-large"></p>
+<p><img src="/img/latest/learn/documentation/container/checkpointing.svg" alt="Illustration of checkpointing" class="diagram-large" /></p>
 
-<p>When a Samza container starts up, it looks for the most recent checkpoint and starts consuming messages from the checkpointed offsets. If the previous container failed unexpectedly, the most recent checkpoint may be slightly behind the current offsets (i.e. the job may have consumed some more messages since the last checkpoint was written), but we can&rsquo;t know for sure. In that case, the job may process a few messages again.</p>
+<p>When a Samza container starts up, it looks for the most recent checkpoint and starts consuming messages from the checkpointed offsets. If the previous container failed unexpectedly, the most recent checkpoint may be slightly behind the current offsets (i.e. the job may have consumed some more messages since the last checkpoint was written), but we can’t know for sure. In that case, the job may process a few messages again.</p>
 
-<p>This guarantee is called <em>at-least-once processing</em>: Samza ensures that your job doesn&rsquo;t miss any messages, even if containers need to be restarted. However, it is possible for your job to see the same message more than once when a container is restarted. We are planning to address this in a future version of Samza, but for now it is just something to be aware of: for example, if you are counting page views, a forcefully killed container could cause events to be slightly over-counted. You can reduce duplication by checkpointing more frequently, at a slight performance cost.</p>
+<p>This guarantee is called <em>at-least-once processing</em>: Samza ensures that your job doesn’t miss any messages, even if containers need to be restarted. However, it is possible for your job to see the same message more than once when a container is restarted. We are planning to address this in a future version of Samza, but for now it is just something to be aware of: for example, if you are counting page views, a forcefully killed container could cause events to be slightly over-counted. You can reduce duplication by checkpointing more frequently, at a slight performance cost.</p>
 
-<p>For checkpoints to be effective, they need to be written somewhere where they will survive faults. Samza allows you to write checkpoints to the file system (using FileSystemCheckpointManager), but that doesn&rsquo;t help if the machine fails and the container needs to be restarted on another machine. The most common configuration is to use Kafka for checkpointing. You can enable this with the following job configuration:</p>
+<p>For checkpoints to be effective, they need to be written somewhere where they will survive faults. Samza allows you to write checkpoints to the file system (using FileSystemCheckpointManager), but that doesn’t help if the machine fails and the container needs to be restarted on another machine. The most common configuration is to use Kafka for checkpointing. You can enable this with the following job configuration:</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"><span></span><span class="c"># The name of your job determines the name under which checkpoints will be stored</span>
-<span class="na">job.name</span><span class="o">=</span><span class="s">example-job</span>
+<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"># The name of your job determines the name under which checkpoints will be stored
+job.name=example-job
 
-<span class="c"># Define a system called &quot;kafka&quot; for consuming and producing to a Kafka cluster</span>
-<span class="na">systems.kafka.samza.factory</span><span class="o">=</span><span class="s">org.apache.samza.system.kafka.KafkaSystemFactory</span>
+# Define a system called "kafka" for consuming and producing to a Kafka cluster
+systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
 
-<span class="c"># Declare that we want our job&#39;s checkpoints to be written to Kafka</span>
-<span class="na">task.checkpoint.factory</span><span class="o">=</span><span class="s">org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory</span>
-<span class="na">task.checkpoint.system</span><span class="o">=</span><span class="s">kafka</span>
+# Declare that we want our job's checkpoints to be written to Kafka
+task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
+task.checkpoint.system=kafka
 
-<span class="c"># By default, a checkpoint is written every 60 seconds. You can change this if you like.</span>
-<span class="na">task.commit.ms</span><span class="o">=</span><span class="s">60000</span></code></pre></figure>
+# By default, a checkpoint is written every 60 seconds. You can change this if you like.
+task.commit.ms=60000</code></pre></figure>
 
 <p>In this configuration, Samza writes checkpoints to a separate Kafka topic called __samza_checkpoint_&lt;job-name&gt;_&lt;job-id&gt; (in the example configuration above, the topic would be called __samza_checkpoint_example-job_1). Once per minute, Samza automatically sends a message to this topic, in which the current offsets of the input streams are encoded. When a Samza container starts up, it looks for the most recent offset message in this topic, and loads that checkpoint.</p>
 
 <p>Sometimes it can be useful to use checkpoints only for some input streams, but not for others. In this case, you can tell Samza to ignore any checkpointed offsets for a particular stream name:</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"><span></span><span class="c"># Ignore any checkpoints for the topic &quot;my-special-topic&quot;</span>
-<span class="na">systems.kafka.streams.my-special-topic.samza.reset.offset</span><span class="o">=</span><span class="s">true</span>
+<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"># Ignore any checkpoints for the topic "my-special-topic"
+systems.kafka.streams.my-special-topic.samza.reset.offset=true
 
-<span class="c"># Always start consuming &quot;my-special-topic&quot; at the oldest available offset</span>
-<span class="na">systems.kafka.streams.my-special-topic.samza.offset.default</span><span class="o">=</span><span class="s">oldest</span></code></pre></figure>
+# Always start consuming "my-special-topic" at the oldest available offset
+systems.kafka.streams.my-special-topic.samza.offset.default=oldest</code></pre></figure>
 
 <p>The following table explains the meaning of these configuration parameters:</p>
 
@@ -696,7 +710,7 @@
   </thead>
   <tbody>
     <tr>
-      <td rowspan="2" class="nowrap">systems.&lt;system&gt;.<br>streams.&lt;stream&gt;.<br>samza.reset.offset</td>
+      <td rowspan="2" class="nowrap">systems.&lt;system&gt;.<br />streams.&lt;stream&gt;.<br />samza.reset.offset</td>
       <td>false (default)</td>
       <td>When container starts up, resume processing from last checkpoint</td>
     </tr>
@@ -705,7 +719,7 @@
       <td>Ignore checkpoint (pretend that no checkpoint is present)</td>
     </tr>
     <tr>
-      <td rowspan="2" class="nowrap">systems.&lt;system&gt;.<br>streams.&lt;stream&gt;.<br>samza.offset.default</td>
+      <td rowspan="2" class="nowrap">systems.&lt;system&gt;.<br />streams.&lt;stream&gt;.<br />samza.offset.default</td>
       <td>upcoming (default)</td>
       <td>When container starts and there is no checkpoint (or the checkpoint is ignored), only process messages that are published after the job is started, but no old messages</td>
     </tr>
@@ -720,43 +734,42 @@
 
 <h3 id="manipulating-checkpoints-manually">Manipulating Checkpoints Manually</h3>
 
-<p>If you want to make a one-off change to a job&rsquo;s consumer offsets, for example to force old messages to be <a href="../jobs/reprocessing.html">processed again</a> with a new version of your code, you can use CheckpointTool to inspect and manipulate the job&rsquo;s checkpoint. The tool is included in Samza&rsquo;s <a href="/contribute/code.html">source repository</a>.</p>
+<p>If you want to make a one-off change to a job’s consumer offsets, for example to force old messages to be <a href="../jobs/reprocessing.html">processed again</a> with a new version of your code, you can use CheckpointTool to inspect and manipulate the job’s checkpoint. The tool is included in Samza’s <a href="/contribute/code.html">source repository</a>.</p>
 
-<p>To inspect a job&rsquo;s latest checkpoint, you need to specify your job&rsquo;s config file, so that the tool knows which job it is dealing with:</p>
+<p>To inspect a job’s latest checkpoint, you need to specify your job’s config file, so that the tool knows which job it is dealing with:</p>
 
-<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>samza-example/target/bin/checkpoint-tool.sh <span class="se">\</span>
-  --config-path<span class="o">=</span>/path/to/job/config.properties</code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash">samza-example/target/bin/checkpoint-tool.sh <span class="se">\</span>
+  <span class="nt">--config-path</span><span class="o">=</span>/path/to/job/config.properties</code></pre></figure>
 
 <p>This command prints out the latest checkpoint in a properties file format. You can save the output to a file, and edit it as you wish. For example, to jump back to the oldest possible point in time, you can set all the offsets to 0. Then you can feed that properties file back into checkpoint-tool.sh and save the modified checkpoint:</p>
 
-<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>samza-example/target/bin/checkpoint-tool.sh <span class="se">\</span>
-  --config-path<span class="o">=</span>/path/to/job/config.properties <span class="se">\</span>
-  --new-offsets<span class="o">=</span>/path/to/new/offsets.properties</code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash">samza-example/target/bin/checkpoint-tool.sh <span class="se">\</span>
+  <span class="nt">--config-path</span><span class="o">=</span>/path/to/job/config.properties <span class="se">\</span>
+  <span class="nt">--new-offsets</span><span class="o">=</span>/path/to/new/offsets.properties</code></pre></figure>
 
 <p>Note that Samza only reads checkpoints on container startup. In order for your checkpoint change to take effect, you need to first stop the job, then save the modified offsets, and then start the job again. If you write a checkpoint while the job is running, it will most likely have no effect.</p>
 
 <h3 id="checkpoint-callbacks">Checkpoint Callbacks</h3>
-
 <p>Currently Samza takes care of checkpointing for all the systems. But there are some use-cases when we may need to inform the Consumer about each checkpoint we make.
 Here are few examples:</p>
 
 <ul>
-<li>Samza cannot do checkpointing correctly or efficiently. One such case is when Samza is not doing the partitioning. In this case the container doesn’t know which SSPs it is responsible for, and thus cannot checkpoint them. An actual example could be a system which relies on an auto-balanced High Level Kafka Consumer for partitioning.</li>
-<li>Systems in which the consumer itself needs to control the checkpointed offset. Some systems do not support seek() operation (are not replayable), but they rely on ACKs for the delivered messages. Example could be a Kinesis consumer. Kinesis library provides a checkpoint callback in the* process() *call (push system). This callback needs to be invoked after the records are processed. This can only be done by the consumer itself.</li>
-<li>Systems that use checkpoint/offset information for some maintenance actions. This information may be used to implement a smart retention policy (deleting all the data after it has been consumed).</li>
+  <li>Samza cannot do checkpointing correctly or efficiently. One such case is when Samza is not doing the partitioning. In this case the container doesn’t know which SSPs it is responsible for, and thus cannot checkpoint them. An actual example could be a system which relies on an auto-balanced High Level Kafka Consumer for partitioning.</li>
+  <li>Systems in which the consumer itself needs to control the checkpointed offset. Some systems do not support seek() operation (are not replayable), but they rely on ACKs for the delivered messages. Example could be a Kinesis consumer. Kinesis library provides a checkpoint callback in the* process() *call (push system). This callback needs to be invoked after the records are processed. This can only be done by the consumer itself.</li>
+  <li>Systems that use checkpoint/offset information for some maintenance actions. This information may be used to implement a smart retention policy (deleting all the data after it has been consumed).</li>
 </ul>
 
 <p>In order to use the checkpoint callback a SystemConsumer needs to implement the CheckpointListener interface:</p>
 
-<figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">CheckpointListener</span> <span class="o">{</span>
-  <span class="kt">void</span> <span class="nf">onCheckpoint</span><span class="o">(</span><span class="n">Map</span><span class="o">&lt;</span><span class="n">SystemStreamPartition</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">offsets</span><span class="o">);</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">CheckpointListener</span> <span class="o">{</span>
+  <span class="kt">void</span> <span class="nf">onCheckpoint</span><span class="o">(</span><span class="nc">Map</span><span class="o">&lt;</span><span class="nc">SystemStreamPartition</span><span class="o">,</span> <span class="nc">String</span><span class="o">&gt;</span> <span class="n">offsets</span><span class="o">);</span>
 <span class="o">}</span></code></pre></figure>
 
-<p>For the SystemConsumers which implement this interface Samza will invoke onCheckpoint() callback every time OffsetManager checkpoints. Checkpoints are done per task, and &lsquo;offsets&rsquo; are all the offsets Samza checkpoints for a task,
+<p>For the SystemConsumers which implement this interface Samza will invoke onCheckpoint() callback every time OffsetManager checkpoints. Checkpoints are done per task, and ‘offsets’ are all the offsets Samza checkpoints for a task,
 and these are the offsets which will be passed to the consumer on restart.
 Note that the callback will happen after the checkpoint and is <strong>not</strong> atomic.</p>
 
-<h2 id="state-management"><a href="state-management.html">State Management &raquo;</a></h2>
+<h2 id="state-management-"><a href="state-management.html">State Management »</a></h2>
 
            
         </div>

Modified: samza/site/learn/documentation/latest/container/coordinator-stream.html
URL: http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/coordinator-stream.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/coordinator-stream.html (original)
+++ samza/site/learn/documentation/latest/container/coordinator-stream.html Wed Jan 18 19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a href="/learn/documentation/1.8.0/container/coordinator-stream">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a href="/learn/documentation/1.7.0/container/coordinator-stream">1.7.0</a></li>
+
+              
+
               <li class="hide"><a href="/learn/documentation/1.6.0/container/coordinator-stream">1.6.0</a></li>
 
               
@@ -638,48 +652,50 @@
    See the License for the specific language governing permissions and
    limitations under the License.
 -->
-
 <p>Samza job is completely driven by the job configuration. Thus, job configurations tend to be pretty large. In order to easily serialize such large configs and persist them between job executions, Samza writes all configurations to a durable stream called the <em>Coordinator Stream</em> when a job is submitted.</p>
 
 <p>A Coordinator Stream is single partitioned stream to which the configurations are written to. It shares the same characteristics as any input stream that can be configured in Samza - ordered, replayable and fault-tolerant. The stream will contain three major types of messages:</p>
 
 <ol>
-<li>Job configuration messages</li>
-<li>Task changelog partition assignment messages</li>
-<li>Container locality message</li>
+  <li>Job configuration messages</li>
+  <li>Task changelog partition assignment messages</li>
+  <li>Container locality message</li>
 </ol>
 
 <h3 id="coordinator-stream-naming">Coordinator Stream Naming</h3>
 
-<p>The naming convention is very similar to that of the checkpoint topic that get&rsquo;s created.</p>
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="s">&quot;__samza_coordinator_%s_%s&quot;</span> <span class="n">format</span> <span class="o">(</span><span class="n">jobName</span><span class="o">.</span><span class="na">replaceAll</span><span class="o">(</span><span class="s">&quot;_&quot;</span><span class="o">,</span> <span class="s">&quot;-&quot;</span><span class="o">),</span> <span class="n">jobId</span><span class="o">.</span><span class="na">replaceAll</span><span class="o">(</span><span class="s">&quot;_&quot;</span><span class="o">,</span> <span class="s">&quot;-&quot;</span><span class="o">))</span>
-</code></pre></div>
-<h3 id="coordinator-stream-message-model">Coordinator Stream Message Model</h3>
+<p>The naming convention is very similar to that of the checkpoint topic that get’s created.</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">"__samza_coordinator_%s_%s"</span> <span class="n">format</span> <span class="o">(</span><span class="n">jobName</span><span class="o">.</span><span class="na">replaceAll</span><span class="o">(</span><span class="s">"_"</span><span class="o">,</span> <span class="s">"-"</span><span class="o">),</span> <span class="n">jobId</span><span class="o">.</span><span class="na">replaceAll</span><span class="o">(</span><span class="s">"_"</span><span class="o">,</span> <span class="s">"-"</span><span class="o">))</span>
+</code></pre></div></div>
 
+<h3 id="coordinator-stream-message-model">Coordinator Stream Message Model</h3>
 <p>Coordinator stream messages are modeled as key/value pairs. The key is a list of well defined fields: <em>version</em>, <em>type</em>, and <em>key</em>. The value is a <em>map</em>. There are some pre-defined fields (such as timestamp, host, etc) for the value map, which are common to all messages.</p>
 
 <p>The full structure for a CoordinatorStreamMessage is:</p>
-<div class="highlight"><pre><code class="language-json" data-lang="json"><span></span><span class="err">key</span> <span class="err">=&gt;</span> <span class="p">[</span><span class="s2">&quot;&lt;version-number&gt;&quot;</span><span class="p">,</span> <span class="s2">&quot;&lt;message-type&gt;&quot;</span><span class="p">,</span> <span class="s2">&quot;&lt;key&gt;&quot;</span><span class="p">]</span>
 
-<span class="err">message</span> <span class="err">=&gt;</span> <span class="p">{</span>
-    <span class="nt">&quot;host&quot;</span><span class="p">:</span> <span class="s2">&quot;&lt;hostname&gt;&quot;</span><span class="p">,</span>
-    <span class="nt">&quot;username&quot;</span><span class="p">:</span> <span class="s2">&quot;&lt;username&gt;&quot;</span><span class="p">,</span>
-    <span class="nt">&quot;source&quot;</span><span class="p">:</span> <span class="s2">&quot;&lt;source-for-this-message&gt;&quot;</span><span class="p">,</span>
-    <span class="nt">&quot;timestamp&quot;</span><span class="p">:</span> <span class="err">&lt;timestamp-of-the-message&gt;</span><span class="p">,</span>
-    <span class="nt">&quot;values&quot;</span><span class="p">:</span> <span class="p">{</span> <span class="p">}</span>
-<span class="p">}</span>
-</code></pre></div>
+<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">key</span><span class="w"> </span><span class="err">=&gt;</span><span class="w"> </span><span class="p">[</span><span class="s2">"&lt;version-number&gt;"</span><span class="p">,</span><span class="w"> </span><span class="s2">"&lt;message-type&gt;"</span><span class="p">,</span><span class="w"> </span><span class="s2">"&lt;key&gt;"</span><span class="p">]</span><span class="w">
+
+</span><span class="err">message</span><span class="w"> </span><span class="err">=&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
+    </span><span class="nl">"host"</span><span class="p">:</span><span class="w"> </span><span class="s2">"&lt;hostname&gt;"</span><span class="p">,</span><span class="w">
+    </span><span class="nl">"username"</span><span class="p">:</span><span class="w"> </span><span class="s2">"&lt;username&gt;"</span><span class="p">,</span><span class="w">
+    </span><span class="nl">"source"</span><span class="p">:</span><span class="w"> </span><span class="s2">"&lt;source-for-this-message&gt;"</span><span class="p">,</span><span class="w">
+    </span><span class="nl">"timestamp"</span><span class="p">:</span><span class="w"> </span><span class="err">&lt;timestamp-of-the-message&gt;</span><span class="p">,</span><span class="w">
+    </span><span class="nl">"values"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">}</span><span class="w">
+</span><span class="p">}</span><span class="w">
+</span></code></pre></div></div>
+
 <p>The messages are essentially serialized and transmitted over the wire as JSON blobs. Hence, for serialization to work correctly, it is very important to not have any unnecessary white spaces. The white spaces in the above JSON blob have been shown for legibility only.</p>
 
 <p>The most important fields are type, key, and values:</p>
 
 <ul>
-<li>type - defines the kind of message</li>
-<li>key - defines a key to associate with the values</li>
-<li>values map - defined on a per-message-type basis, and defines a set of values associated with the type</li>
+  <li>type - defines the kind of message</li>
+  <li>key - defines a key to associate with the values</li>
+  <li>values map - defined on a per-message-type basis, and defines a set of values associated with the type</li>
 </ul>
 
-<p>The coordinator stream messages that are currently supported are listed below:
+<p>The coordinator stream messages that are currently supported are listed below:</p>
 <style>
             table th, table td {
                 text-align: left;
@@ -689,16 +705,17 @@
                 border-top: 1px solid #ccc;
                 border-left: 0;
                 border-right: 0;
-            }</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>        table td.property, table td.default {
-            white-space: nowrap;
-        }
-
-        table th {
-            background-color: #eee;
-        }
-</code></pre></div>
-<p></style>
+            }
+
+            table td.property, table td.default {
+                white-space: nowrap;
+            }
+
+            table th {
+                background-color: #eee;
+            }
+</style>
+
 <table>
     <tr>
         <th>Message</th>
@@ -709,36 +726,35 @@
     <tr>
         <td> Configuration Message <br />
             (Applies to all configuration <br />
-             options listed in <a href="../jobs/configuration-table.html">Configuration</a>) </td>
+             options listed in [Configuration](../jobs/configuration-table.html)) </td>
         <td> set-config </td>
         <td> &lt;config-name&gt; </td>
-        <td> &lsquo;value&rsquo; =&gt; &lt;config-value&gt; </td>
+        <td> 'value' =&gt; &lt;config-value&gt; </td>
     </tr>
     <tr>
         <td> Task-ChangelogPartition Assignment Message </td>
         <td> set-changelog </td>
-        <td> &lt;<a href="../api/org/apache/samza/container/TaskName.java">TaskName</a>&gt; </td>
-        <td> &lsquo;partition&rsquo; =&gt; &lt;Changelog-Partition-Id&gt;
+        <td> &lt;[TaskName](../api/org/apache/samza/container/TaskName.java)&gt; </td>
+        <td> 'partition' =&gt; &lt;Changelog-Partition-Id&gt;
         </td>
     </tr>
     <tr>
         <td> Container Locality Message </td>
         <td> set-container-host-assignment </td>
         <td> &lt;Container-Id&gt; </td>
-        <td> &lsquo;hostname&rsquo; =&gt; &lt;HostName&gt;
+        <td> 'hostname' =&gt; &lt;HostName&gt;
         </td>
     </tr>
-</table></p>
+</table>
 
 <h3 id="coordinator-stream-writer">Coordinator Stream Writer</h3>
-
 <p>Samza provides a command line tool to write Job Configuration messages to the coordinator stream. The tool can be used as follows:</p>
 
-<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>samza-example/target/bin/run-coordinator-stream-writer.sh <span class="se">\</span>
-  --config-path<span class="o">=</span>/path/to/job/config.properties <span class="se">\</span>
-  --type set-config <span class="se">\</span>
-  --key job.container.count <span class="se">\</span>
-  --value <span class="m">8</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash">samza-example/target/bin/run-coordinator-stream-writer.sh <span class="se">\</span>
+  <span class="nt">--config-path</span><span class="o">=</span>/path/to/job/config.properties <span class="se">\</span>
+  <span class="nt">--type</span> set-config <span class="se">\</span>
+  <span class="nt">--key</span> job.container.count <span class="se">\</span>
+  <span class="nt">--value</span> 8</code></pre></figure>
 
 <h2 id="job-coordinator"><a name="JobCoordinator"></a>Job Coordinator</h2>
 
@@ -746,7 +762,7 @@
 
 <p>Job Model is the data model used to represent a Samza job, which also incorporates the Job configuration. The hierarchy of a Samza job - job has containers, and each of the containers has tasks - is encapsulated in the Job Model, along with relevant information such as container id, task names, partition information, etc.</p>
 
-<p>The Job Coordinator exposes the Job Model and Job Configuration via an HTTP service. The URL for the Job Coordinator&rsquo;s HTTP service is passed as an environment variable to the Samza Containers when the containers are launched. Containers may write meta-information, such as locality - the hostname of the machine on which the container is running. However, they will read the Job Model and Configuration by querying the Job Coordinator via the HTTP service.</p>
+<p>The Job Coordinator exposes the Job Model and Job Configuration via an HTTP service. The URL for the Job Coordinator’s HTTP service is passed as an environment variable to the Samza Containers when the containers are launched. Containers may write meta-information, such as locality - the hostname of the machine on which the container is running. However, they will read the Job Model and Configuration by querying the Job Coordinator via the HTTP service.</p>
 
 <p>Thus, Job Coorindator is the single component that has the latest view of the entire job status. This is very useful as it allows us to extend functionality of the Job Coordinator, in the future, to manage the lifecycle of the job (such as start/stop container, modify task assignment etc).</p>
 
@@ -755,21 +771,20 @@
 <p>The Job Coordinator resides in the same container as the Samza Application Master. Thus, the availability of the Job Coordinator is tied to the availability of the Application Master (AM) in the Yarn cluster. The Samza containers are started only after initializing the Job Coordinator from the Coordinator Stream. In stable condition, when the Samza container comes up, it should be able to read the JobModel from the Job Coordinator without timing out.</p>
 
 <h2 id="benefits-of-coordinator-stream-model">Benefits of Coordinator Stream Model</h2>
-
 <p>Writing the configuration to a durable stream opens the door for Samza to do a couple of things:</p>
 
 <ol>
-<li>Removes the size-bound on the Job configuration</li>
-<li>Exposes job-related configuration and metadata to the containers using a standard data model and communication interface (See <a href="#JobCoordinator">Job Coordinator</a> for details)</li>
-<li>Certain configurations should only be set one time. Changing them in future deployment amounts to resetting the entire state of the job because it may re-shuffle input partitions to the containers. For example, changing <a href="../api/javadocs/org/apache/samza/container/grouper/stream/SystemStreamPartitionGrouper.java">SystemStreamPartitionGrouper</a> on a stateful Samza job would inter-mingle state from different StreamTasks in a single changelog partition. Without persistent configuration, there is no easy way to check whether a job&rsquo;s current configuration is valid or not.</li>
-<li>Job configuration can be dynamically changed by writing to the Coorinator Stream. This can enable features that require the job to be reactive to configuration change (eg. host-affinity, auto-scaling, dynamic reconfiguration etc).</li>
-<li>Provides a unified view of the job state, enabling Samza with more powerful ways of controlling container controls (See <a href="#JobCoordinator">Job Coordinator</a> for details)</li>
-<li>Enables future design of Job Coordinator fail-over since it serves as a single source of truth of the current job state</li>
+  <li>Removes the size-bound on the Job configuration</li>
+  <li>Exposes job-related configuration and metadata to the containers using a standard data model and communication interface (See <a href="#JobCoordinator">Job Coordinator</a> for details)</li>
+  <li>Certain configurations should only be set one time. Changing them in future deployment amounts to resetting the entire state of the job because it may re-shuffle input partitions to the containers. For example, changing <a href="../api/javadocs/org/apache/samza/container/grouper/stream/SystemStreamPartitionGrouper.java">SystemStreamPartitionGrouper</a> on a stateful Samza job would inter-mingle state from different StreamTasks in a single changelog partition. Without persistent configuration, there is no easy way to check whether a job’s current configuration is valid or not.</li>
+  <li>Job configuration can be dynamically changed by writing to the Coorinator Stream. This can enable features that require the job to be reactive to configuration change (eg. host-affinity, auto-scaling, dynamic reconfiguration etc).</li>
+  <li>Provides a unified view of the job state, enabling Samza with more powerful ways of controlling container controls (See <a href="#JobCoordinator">Job Coordinator</a> for details)</li>
+  <li>Enables future design of Job Coordinator fail-over since it serves as a single source of truth of the current job state</li>
 </ol>
 
 <p>For other interesting features that can leverage this model, please refer the <a href="https://issues.apache.org/jira/secure/attachment/12670650/DESIGN-SAMZA-348-1.pdf">design document</a>.</p>
 
-<h2 id="event-loop"><a href="event-loop.html">Event Loop &raquo;</a></h2>
+<h2 id="event-loop-"><a href="event-loop.html">Event Loop »</a></h2>
 
            
         </div>

Modified: samza/site/learn/documentation/latest/container/event-loop.html
URL: http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/event-loop.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/event-loop.html (original)
+++ samza/site/learn/documentation/latest/container/event-loop.html Wed Jan 18 19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a href="/learn/documentation/1.8.0/container/event-loop">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a href="/learn/documentation/1.7.0/container/event-loop">1.7.0</a></li>
+
+              
+
               <li class="hide"><a href="/learn/documentation/1.6.0/container/event-loop">1.6.0</a></li>
 
               
@@ -639,13 +653,13 @@
    limitations under the License.
 -->
 
-<p>The event loop orchestrates <a href="streams.html">reading and processing messages</a>, <a href="checkpointing.html">checkpointing</a>, <a href="windowing.html">windowing</a> and <a href="metrics.html">flushing metrics</a> among tasks. </p>
+<p>The event loop orchestrates <a href="streams.html">reading and processing messages</a>, <a href="checkpointing.html">checkpointing</a>, <a href="windowing.html">windowing</a> and <a href="metrics.html">flushing metrics</a> among tasks.</p>
 
 <p>By default Samza uses a single thread in each <a href="samza-container.html">container</a> to run the tasks. This fits CPU-bound jobs well; to get more CPU processors, simply add more containers. The single thread execution also simplifies sharing task state and resource management.</p>
 
 <p>For IO-bound jobs, Samza supports finer-grained parallelism for both synchronous and asynchronous tasks. For synchronous tasks (<a href="../api/javadocs/org/apache/samza/task/StreamTask.html">StreamTask</a> and <a href="../api/javadocs/org/apache/samza/task/WindowableTask.html">WindowableTask</a>), you can schedule them to run in parallel by configuring the build-in thread pool <a href="../jobs/configuration-table.html">job.container.thread.pool.size</a>. This fits the blocking-IO task scenario. For asynchronous tasks (<a href="../api/javadocs/org/apache/samza/task/AsyncStreamTask.html">AsyncStreamTask</a>), you can make async IO calls and trigger callbacks upon completion. The finest degree of parallelism Samza provides is within a task, and is configured by <a href="../jobs/configuration-table.html">task.max.concurrency</a>.</p>
 
-<p>The latest version of Samza is thread-safe. You can safely access your job’s state in <a href="state-management.html">key-value store</a>, write messages and checkpoint offset in the task threads. If you have other data shared among tasks, such as global variables or static data, it is not thread safe if the data can be accessed concurrently by multiple threads, e.g. StreamTask running in the configured thread pool with more than one threads. For states within a task, such as member variables, Samza guarantees the mutual exclusiveness of process, window and commit so there will be no concurrent modifications among these operations and any state change from one operation will be fully visible to the others.     </p>
+<p>The latest version of Samza is thread-safe. You can safely access your job’s state in <a href="state-management.html">key-value store</a>, write messages and checkpoint offset in the task threads. If you have other data shared among tasks, such as global variables or static data, it is not thread safe if the data can be accessed concurrently by multiple threads, e.g. StreamTask running in the configured thread pool with more than one threads. For states within a task, such as member variables, Samza guarantees the mutual exclusiveness of process, window and commit so there will be no concurrent modifications among these operations and any state change from one operation will be fully visible to the others.</p>
 
 <h3 id="event-loop-internals">Event Loop Internals</h3>
 
@@ -654,23 +668,23 @@
 <p>The event loop works as follows:</p>
 
 <ol>
-<li>Choose a message from the incoming message queue;</li>
-<li>Schedule the appropriate <a href="samza-container.html">task instance</a> to process the message;</li>
-<li>Schedule window() on the task instance to run if it implements WindowableTask, and the window timer has been triggered;</li>
-<li>Send any output from the process() and window() calls to the appropriate <a href="../api/javadocs/org/apache/samza/system/SystemProducer.html">SystemProducers</a>;</li>
-<li>Write checkpoints and flush the state stores for any tasks whose <a href="checkpointing.html">commit interval</a> has elapsed.</li>
-<li>Block if all task instances are busy with processing outstanding messages, windowing or checkpointing.</li>
+  <li>Choose a message from the incoming message queue;</li>
+  <li>Schedule the appropriate <a href="samza-container.html">task instance</a> to process the message;</li>
+  <li>Schedule window() on the task instance to run if it implements WindowableTask, and the window timer has been triggered;</li>
+  <li>Send any output from the process() and window() calls to the appropriate <a href="../api/javadocs/org/apache/samza/system/SystemProducer.html">SystemProducers</a>;</li>
+  <li>Write checkpoints and flush the state stores for any tasks whose <a href="checkpointing.html">commit interval</a> has elapsed.</li>
+  <li>Block if all task instances are busy with processing outstanding messages, windowing or checkpointing.</li>
 </ol>
 
 <p>The container does this, in a loop, until it is shut down.</p>
 
-<h3 id="semantics-for-synchronous-tasks-v-s-asynchronous-tasks">Semantics for Synchronous Tasks v.s. Asynchronous Tasks</h3>
+<h3 id="semantics-for-synchronous-tasks-vs-asynchronous-tasks">Semantics for Synchronous Tasks v.s. Asynchronous Tasks</h3>
 
 <p>The semantics of the event loop differs when running synchronous tasks and asynchronous tasks:</p>
 
 <ul>
-<li>For synchronous tasks (StreamTask and WindowableTask), process() and window() will run on the single main thread by default. You can configure job.container.thread.pool.size to be greater than 1, and event loop will schedule the process() and window() to run in the thread pool.<br></li>
-<li>For Asynchronous tasks (AsyncStreamTask), processAsync() will always be invoked in a single thread, while callbacks can be triggered from a different user thread. </li>
+  <li>For synchronous tasks (StreamTask and WindowableTask), process() and window() will run on the single main thread by default. You can configure job.container.thread.pool.size to be greater than 1, and event loop will schedule the process() and window() to run in the thread pool.</li>
+  <li>For Asynchronous tasks (AsyncStreamTask), processAsync() will always be invoked in a single thread, while callbacks can be triggered from a different user thread.</li>
 </ul>
 
 <p>In both cases, the default concurrency within a task is 1, meaning at most one outstanding message in processing per task. This guarantees in-order message processing in a topic partition. You can further increase it by configuring task.max.concurrency to be greater than 1. This allows multiple outstanding messages to be processed in parallel by a task. This option increases the parallelism within a task, but may result in out-of-order processing and completion.</p>
@@ -678,20 +692,20 @@
 <p>The following semantics are guaranteed in any of the above cases (for happens-before semantics, see <a href="https://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html">here</a>):</p>
 
 <ul>
-<li>If task.max.concurrency = 1, each message process completion in a task is guaranteed to happen-before the next invocation of process()/processAsync() of the same task. If task.max.concurrency &gt; 1, there is no such happens-before constraint and user should synchronize access to any shared/global variables in the Task..</li>
-<li>WindowableTask.window() is called when no invocations to process()/processAsync() are pending and no new process()/processAsync() invocations can be scheduled until it completes. Therefore, a guarantee that all previous process()/processAsync() invocations happen before an invocation of WindowableTask.window(). An invocation to WindowableTask.window() is guaranteed to happen-before any subsequent process()/processAsync() invocations. The Samza engine is responsible for ensuring that window is invoked in a timely manner.</li>
-<li>Checkpointing is guaranteed to only cover events that are fully processed. It happens only when there are no pending process()/processAsync() or WindowableTask.window() invocations. All preceding invocations happen-before checkpointing and checkpointing happens-before all subsequent invocations.</li>
+  <li>If task.max.concurrency = 1, each message process completion in a task is guaranteed to happen-before the next invocation of process()/processAsync() of the same task. If task.max.concurrency &gt; 1, there is no such happens-before constraint and user should synchronize access to any shared/global variables in the Task..</li>
+  <li>WindowableTask.window() is called when no invocations to process()/processAsync() are pending and no new process()/processAsync() invocations can be scheduled until it completes. Therefore, a guarantee that all previous process()/processAsync() invocations happen before an invocation of WindowableTask.window(). An invocation to WindowableTask.window() is guaranteed to happen-before any subsequent process()/processAsync() invocations. The Samza engine is responsible for ensuring that window is invoked in a timely manner.</li>
+  <li>Checkpointing is guaranteed to only cover events that are fully processed. It happens only when there are no pending process()/processAsync() or WindowableTask.window() invocations. All preceding invocations happen-before checkpointing and checkpointing happens-before all subsequent invocations.</li>
 </ul>
 
 <p>More details and examples can be found in <a href="../../../tutorials/latest/samza-async-user-guide.html">Samza Async API and Multithreading User Guide</a>.</p>
 
 <h3 id="lifecycle">Lifecycle</h3>
 
-<p>The only way in which a developer can hook into a SamzaContainer&rsquo;s lifecycle is through the standard InitableTask, ClosableTask, StreamTask/AsyncStreamTask, and WindowableTask. In cases where pluggable logic needs to be added to wrap a StreamTask, the StreamTask can be wrapped by another StreamTask implementation that handles the custom logic before calling into the wrapped StreamTask.</p>
+<p>The only way in which a developer can hook into a SamzaContainer’s lifecycle is through the standard InitableTask, ClosableTask, StreamTask/AsyncStreamTask, and WindowableTask. In cases where pluggable logic needs to be added to wrap a StreamTask, the StreamTask can be wrapped by another StreamTask implementation that handles the custom logic before calling into the wrapped StreamTask.</p>
 
 <p>A concrete example is a set of StreamTasks that all want to share the same try/catch logic in their process() method. A StreamTask can be implemented that wraps the original StreamTasks, and surrounds the original process() call with the appropriate try/catch logic. For more details, see <a href="https://issues.apache.org/jira/browse/SAMZA-437">this discussion</a>.</p>
 
-<h2 id="metrics"><a href="metrics.html">Metrics &raquo;</a></h2>
+<h2 id="metrics-"><a href="metrics.html">Metrics »</a></h2>
 
            
         </div>

Modified: samza/site/learn/documentation/latest/container/jmx.html
URL: http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/jmx.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/jmx.html (original)
+++ samza/site/learn/documentation/latest/container/jmx.html Wed Jan 18 19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a href="/learn/documentation/1.8.0/container/jmx">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a href="/learn/documentation/1.7.0/container/jmx">1.7.0</a></li>
+
+              
+
               <li class="hide"><a href="/learn/documentation/1.6.0/container/jmx">1.6.0</a></li>
 
               
@@ -639,22 +653,24 @@
    limitations under the License.
 -->
 
-<p>Samza&rsquo;s containers and YARN ApplicationMaster enable <a href="http://docs.oracle.com/javase/tutorial/jmx/">JMX</a> by default. JMX can be used for managing the JVM; for example, you can connect to it using <a href="http://docs.oracle.com/javase/7/docs/technotes/guides/management/jconsole.html">jconsole</a>, which is included in the JDK.</p>
+<p>Samza’s containers and YARN ApplicationMaster enable <a href="http://docs.oracle.com/javase/tutorial/jmx/">JMX</a> by default. JMX can be used for managing the JVM; for example, you can connect to it using <a href="http://docs.oracle.com/javase/7/docs/technotes/guides/management/jconsole.html">jconsole</a>, which is included in the JDK.</p>
 
 <p>You can tell Samza to publish its internal <a href="metrics.html">metrics</a>, and any custom metrics you define, as JMX MBeans. To enable this, set the following properties in your job configuration:</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"><span></span><span class="c"># Define a Samza metrics reporter called &quot;jmx&quot;, which publishes to JMX</span>
-<span class="na">metrics.reporter.jmx.class</span><span class="o">=</span><span class="s">org.apache.samza.metrics.reporter.JmxReporterFactory</span>
+<figure class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"># Define a Samza metrics reporter called "jmx", which publishes to JMX
+metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
+
+# Use it (if you have multiple reporters defined, separate them with commas)
+metrics.reporters=jmx</code></pre></figure>
 
-<span class="c"># Use it (if you have multiple reporters defined, separate them with commas)</span>
-<span class="na">metrics.reporters</span><span class="o">=</span><span class="s">jmx</span></code></pre></figure>
+<p>JMX needs to be configured to use a specific port, but in a distributed environment, there is no way of knowing in advance which ports are available on the machines running your containers. Therefore Samza chooses the JMX port randomly. If you need to connect to it, you can find the port by looking in the container’s logs, which report the JMX server details as follows:</p>
 
-<p>JMX needs to be configured to use a specific port, but in a distributed environment, there is no way of knowing in advance which ports are available on the machines running your containers. Therefore Samza chooses the JMX port randomly. If you need to connect to it, you can find the port by looking in the container&rsquo;s logs, which report the JMX server details as follows:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>2014-06-02 21:50:17 JmxServer [INFO] According to InetAddress.getLocalHost.getHostName we are samza-grid-1234.example.com
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2014-06-02 21:50:17 JmxServer [INFO] According to InetAddress.getLocalHost.getHostName we are samza-grid-1234.example.com
 2014-06-02 21:50:17 JmxServer [INFO] Started JmxServer registry port=50214 server port=50215 url=service:jmx:rmi://localhost:50215/jndi/rmi://localhost:50214/jmxrmi
 2014-06-02 21:50:17 JmxServer [INFO] If you are tunneling, you might want to try JmxServer registry port=50214 server port=50215 url=service:jmx:rmi://samza-grid-1234.example.com:50215/jndi/rmi://samza-grid-1234.example.com:50214/jmxrmi
-</code></pre></div>
-<h2 id="jobrunner"><a href="../jobs/job-runner.html">JobRunner &raquo;</a></h2>
+</code></pre></div></div>
+
+<h2 id="jobrunner-"><a href="../jobs/job-runner.html">JobRunner »</a></h2>
 
            
         </div>