You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/05/06 02:01:17 UTC

[GitHub] [kafka] mjsax opened a new pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

mjsax opened a new pull request #8621:
URL: https://github.com/apache/kafka/pull/8621


   Call for review @abbccdda @guozhangwang 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422310807



##########
File path: docs/streams/core-concepts.html
##########
@@ -206,17 +206,26 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
-
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
-        More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />

Review comment:
       ```suggestion
           For more information on how this is done inside Kafka Streams, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r421166821



##########
File path: docs/upgrade.html
##########
@@ -19,6 +19,12 @@
 
 <script id="upgrade-template" type="text/x-handlebars-template">
 
+<h5><a id="upgrade_260_notable" href="#upgrade_260_notable">Notable changes in 2.6.0</a></h5>
+<ul>
+    <li>Kafka Streams adds a new processing mode (requires broker 2.5 or newer) that improves application

Review comment:
       I don't think we _need_ but sound like a good idea :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r421167972



##########
File path: docs/streams/core-concepts.html
##########
@@ -206,16 +206,16 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
 
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
+        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value (default value is <b>at_least_once</b>) to <b>exactly_once</b> or <b>exactly_once_beta</b> (requires brokers version 2.5 or newer).
         More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.

Review comment:
       Oh, I missed to extend this section... IMHO, it good to have the details in the config section as many users (especially existing users) won't read the "concepts" page.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422315991



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -761,16 +784,19 @@ <h3><a id="streams_api_changes_0110" href="#streams_api_changes_0110">Streams AP
 
     <p> Metrics using exactly-once semantics: </p>
     <p>
-        If exactly-once processing is enabled via the <code>processing.guarantees</code> parameter, internally Streams switches from a producer per thread to a producer per task runtime model.
+        If <code>"exactly_once"</code> processing is enabled via the <code>processing.guarantee</code> parameter,
+        internally Streams switches from a producer per thread to a producer per task runtime model
+        (Note: using <code>"exactly_once_beta"</code> does use a producer per thread and thus <code>client.id</code> don't change

Review comment:
       ```suggestion
           Using <code>"exactly_once_beta"</code> does use a producer-per-thread, so <code>client.id</code> doesn't change,
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422312864



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter

Review comment:
       ```suggestion
           Starting in Kafka Streams 2.6.x, a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422309799



##########
File path: docs/streams/core-concepts.html
##########
@@ -206,17 +206,26 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
-
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
-        More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
+
+        As of the 2.6.0 release, Kafka Streams supports an improve implementation of exactly-once processing called "exactly-once beta"
+        (requires broker version 2.5.0 or newer).
+        This implementation is more efficient (i.e., less client and broker resource utilization; like client threads, used network connections etc.)

Review comment:
       ```suggestion
           This implementation is more efficient, because it reduces client and broker resource utilization, like client threads and used network connections.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] abbccdda commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
abbccdda commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r421759079



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
+        <code>processing.guarantee</code>) is available.
+        To use this new feature, your brokers must be on version 2.5.x or newer.

Review comment:
       We could also mention the potential outcome if broker is not on >= 2.5.x, which is a hard crash in this transition.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422310480



##########
File path: docs/streams/core-concepts.html
##########
@@ -206,17 +206,26 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
-
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
-        More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
+
+        As of the 2.6.0 release, Kafka Streams supports an improve implementation of exactly-once processing called "exactly-once beta"
+        (requires broker version 2.5.0 or newer).
+        This implementation is more efficient (i.e., less client and broker resource utilization; like client threads, used network connections etc.)
+        and allows for higher throughput and improved scalability.
+        To read more details on how this is done inside the brokers and Kafka Streams, readers are recommended to read

Review comment:
       ```suggestion
           For more information on how this is done inside the brokers and Kafka Streams, see 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422316124



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -761,16 +784,19 @@ <h3><a id="streams_api_changes_0110" href="#streams_api_changes_0110">Streams AP
 
     <p> Metrics using exactly-once semantics: </p>
     <p>
-        If exactly-once processing is enabled via the <code>processing.guarantees</code> parameter, internally Streams switches from a producer per thread to a producer per task runtime model.
+        If <code>"exactly_once"</code> processing is enabled via the <code>processing.guarantee</code> parameter,
+        internally Streams switches from a producer per thread to a producer per task runtime model
+        (Note: using <code>"exactly_once_beta"</code> does use a producer per thread and thus <code>client.id</code> don't change
+        compare to <code>"at_least_once"</code> for this case).

Review comment:
       ```suggestion
           compared with <code>"at_least_once"</code> for this case).
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422312582



##########
File path: docs/streams/developer-guide/config-streams.html
##########
@@ -456,13 +457,22 @@ <h4><a class="toc-backref" href="#id11">num.stream.threads</a><a class="headerli
         <div class="section" id="processing-guarantee">
           <span id="streams-developer-guide-processing-guarantee"></span><h4><a class="toc-backref" href="#id25">processing.guarantee</a><a class="headerlink" href="#processing-guarantee" title="Permalink to this headline"></a></h4>
           <blockquote>
-            <div>The processing guarantee that should be used. Possible values are <code class="docutils literal"><span class="pre">"at_least_once"</span></code> (default) and <code class="docutils literal"><span class="pre">"exactly_once"</span></code>.
-                 Note that if exactly-once processing is enabled, the default for parameter <code class="docutils literal"><span class="pre">commit.interval.ms</span></code> changes to 100ms.
+            <div>The processing guarantee that should be used.
+                 Possible values are <code class="docutils literal"><span class="pre">"at_least_once"</span></code> (default),
+                 <code class="docutils literal"><span class="pre">"exactly_once"</span></code>,
+                 and <code class="docutils literal"><span class="pre">"exactly_once_beta"</span></code>.
+                 Using <code class="docutils literal"><span class="pre">"exactly_once"</span></code> requires broker
+                 version 0.11.0 or newer, while using <code class="docutils literal"><span class="pre">"exactly_once_beta"</span></code>
+                 requires broker version 2.5 or newer.
+                 Note that if exactly-once processing is enabled, the default for parameter
+                 <code class="docutils literal"><span class="pre">commit.interval.ms</span></code> changes to 100ms.
                  Additionally, consumers are configured with <code class="docutils literal"><span class="pre">isolation.level="read_committed"</span></code>
-                 and producers are configured with <code class="docutils literal"><span class="pre">retries=Integer.MAX_VALUE</span></code>, <code class="docutils literal"><span class="pre">enable.idempotence=true</span></code>,
-                 and <code class="docutils literal"><span class="pre">max.in.flight.requests.per.connection=1</span></code> per default.
+                 and producers are configured with <code class="docutils literal"><span class="pre">enable.idempotence=true</span></code> per default.
                  Note that by default exactly-once processing requires a cluster of at least three brokers what is the recommended setting for production.
-                 For development you can change this, by adjusting broker setting <code class="docutils literal"><span class="pre">transaction.state.log.replication.factor</span></code> and <code class="docutils literal"><span class="pre">transaction.state.log.min.isr</span></code> to the number of broker you want to use.
+                 For development you can change this, by adjusting broker setting
+                 <code class="docutils literal"><span class="pre">transaction.state.log.replication.factor</span></code>
+                 and <code class="docutils literal"><span class="pre">transaction.state.log.min.isr</span></code>
+                 to the number of broker you want to use.

Review comment:
       ```suggestion
                    to the number of brokers you want to use.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r421953049



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
+        <code>processing.guarantee</code>) is available.
+        To use this new feature, your brokers must be on version 2.5.x or newer.

Review comment:
       I guess we can. However, isn't a hard crash _always_ the result? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] abbccdda commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
abbccdda commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r420514826



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
+        <code>processing.guarantee</code>) is available.
+        To use this new feature, your brokers must be on version 2.5.x or newer.
+        A switch from <code>"exactly_once"</code> to <code>"exactly_once_beta"</code> (or the other way around) is
+        only possible if the application is on version 2.6.x.
+        Hence, if you want to upgrade your application from an older version and enable this feature,
+        you first need to upgrade your application to version 2.6.x staying on <code>"exactly_once"</code>
+        and afterwards do second round of rolling bounces to switch to <code>"exactly_once_beta"</code>.
+        For a downgrade do the revers: first switch the config from <code>"exactly_once_beta"</code> to
+        <code>"exactly_once"</code>to disable the feature on your 2.6.x application.
+        Afterward, you can downgrade you application to a pre 2.6.x version.

Review comment:
       downgrade your

##########
File path: docs/streams/upgrade-guide.html
##########
@@ -72,6 +86,15 @@ <h1>Upgrade Guide and API Changes</h1>
         More details about the new config <code>StreamsConfig#TOPOLOGY_OPTIMIZATION</code> can be found in <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-295%3A+Add+Streams+Configuration+Allowing+for+Optional+Topology+Optimization">KIP-295</a>.
     </p>
 
+    <h3><a id="streams_api_changes_260" href="#streams_api_changes_260">Streams API changes in 2.6.0</a></h3>
+    <p>
+        We added a new processing mode that improves application scalability using exactly-once guarantees
+        (via <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics">KIP-447</a>>).
+        You can enable this new feature be setting the configuration parameter <code>processing.guarantee</code> to the
+        new value <code>"exactly_once_beta"</code>.
+        Note that you need brokers with version 2.5 or newer to use this new feature.

Review comment:
       nit: could just say `this feature` as we mentioned new feature earlier.

##########
File path: docs/streams/core-concepts.html
##########
@@ -206,16 +206,16 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
 
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
+        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value (default value is <b>at_least_once</b>) to <b>exactly_once</b> or <b>exactly_once_beta</b> (requires brokers version 2.5 or newer).
         More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.

Review comment:
       It's a bit unfortunate that inside StreamsConfig we don't have further information concerning the difference between EOS and EOS beta. Maybe we just talk more into details to compare between the two here? Just need to let user understand that EOS beta does guarantee the same semantics with potentially better scalability as an improvement in 2.5 & 2.6.

##########
File path: docs/upgrade.html
##########
@@ -19,6 +19,12 @@
 
 <script id="upgrade-template" type="text/x-handlebars-template">
 
+<h5><a id="upgrade_260_notable" href="#upgrade_260_notable">Notable changes in 2.6.0</a></h5>
+<ul>
+    <li>Kafka Streams adds a new processing mode (requires broker 2.5 or newer) that improves application

Review comment:
       Do we need to attach a link to KIP here as well?

##########
File path: docs/streams/upgrade-guide.html
##########
@@ -72,6 +86,15 @@ <h1>Upgrade Guide and API Changes</h1>
         More details about the new config <code>StreamsConfig#TOPOLOGY_OPTIMIZATION</code> can be found in <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-295%3A+Add+Streams+Configuration+Allowing+for+Optional+Topology+Optimization">KIP-295</a>.
     </p>
 
+    <h3><a id="streams_api_changes_260" href="#streams_api_changes_260">Streams API changes in 2.6.0</a></h3>
+    <p>
+        We added a new processing mode that improves application scalability using exactly-once guarantees
+        (via <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics">KIP-447</a>>).
+        You can enable this new feature be setting the configuration parameter <code>processing.guarantee</code> to the

Review comment:
       by setting




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422314157



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
+        <code>processing.guarantee</code>) is available.
+        To use this new feature, your brokers must be on version 2.5.x or newer.
+        A switch from <code>"exactly_once"</code> to <code>"exactly_once_beta"</code> (or the other way around) is
+        only possible if the application is on version 2.6.x.
+        Hence, if you want to upgrade your application from an older version and enable this feature,
+        you first need to upgrade your application to version 2.6.x staying on <code>"exactly_once"</code>
+        and afterwards do second round of rolling bounces to switch to <code>"exactly_once_beta"</code>.
+        For a downgrade do the revers: first switch the config from <code>"exactly_once_beta"</code> to
+        <code>"exactly_once"</code>to disable the feature on your 2.6.x application.

Review comment:
       ```suggestion
           <code>"exactly_once"</code> to disable the feature on your 2.6.x application.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422309099



##########
File path: docs/streams/core-concepts.html
##########
@@ -206,17 +206,26 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
-
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
-        More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
+
+        As of the 2.6.0 release, Kafka Streams supports an improve implementation of exactly-once processing called "exactly-once beta"

Review comment:
       ```suggestion
           As of the 2.6.0 release, Kafka Streams supports an improved implementation of exactly-once processing, named "exactly-once beta", 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422310005



##########
File path: docs/streams/core-concepts.html
##########
@@ -206,17 +206,26 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
-
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
-        More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
+
+        As of the 2.6.0 release, Kafka Streams supports an improve implementation of exactly-once processing called "exactly-once beta"
+        (requires broker version 2.5.0 or newer).
+        This implementation is more efficient (i.e., less client and broker resource utilization; like client threads, used network connections etc.)
+        and allows for higher throughput and improved scalability.

Review comment:
       ```suggestion
           and it enables higher throughput and improved scalability.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422314861



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
+        <code>processing.guarantee</code>) is available.
+        To use this new feature, your brokers must be on version 2.5.x or newer.
+        A switch from <code>"exactly_once"</code> to <code>"exactly_once_beta"</code> (or the other way around) is
+        only possible if the application is on version 2.6.x.
+        Hence, if you want to upgrade your application from an older version and enable this feature,
+        you first need to upgrade your application to version 2.6.x staying on <code>"exactly_once"</code>
+        and afterwards do second round of rolling bounces to switch to <code>"exactly_once_beta"</code>.
+        For a downgrade do the revers: first switch the config from <code>"exactly_once_beta"</code> to
+        <code>"exactly_once"</code>to disable the feature on your 2.6.x application.
+        Afterward, you can downgrade your application to a pre 2.6.x version.

Review comment:
       ```suggestion
           Afterward, you can downgrade your application to a pre-2.6.x version.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422309799



##########
File path: docs/streams/core-concepts.html
##########
@@ -206,17 +206,26 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
-
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
-        More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
+
+        As of the 2.6.0 release, Kafka Streams supports an improve implementation of exactly-once processing called "exactly-once beta"
+        (requires broker version 2.5.0 or newer).
+        This implementation is more efficient (i.e., less client and broker resource utilization; like client threads, used network connections etc.)

Review comment:
       ```suggestion
           This implementation is more efficient, because it reduces client and broker resource utilization, like client threads and used network connections, 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
mjsax commented on pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#issuecomment-624968745


   @abbccdda Updated this PR -- added more details and fixed some links and inconsistencies.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422313981



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
+        <code>processing.guarantee</code>) is available.
+        To use this new feature, your brokers must be on version 2.5.x or newer.
+        A switch from <code>"exactly_once"</code> to <code>"exactly_once_beta"</code> (or the other way around) is
+        only possible if the application is on version 2.6.x.
+        Hence, if you want to upgrade your application from an older version and enable this feature,
+        you first need to upgrade your application to version 2.6.x staying on <code>"exactly_once"</code>
+        and afterwards do second round of rolling bounces to switch to <code>"exactly_once_beta"</code>.
+        For a downgrade do the revers: first switch the config from <code>"exactly_once_beta"</code> to

Review comment:
       ```suggestion
           For a downgrade, do the reverse: first switch the config from <code>"exactly_once_beta"</code> to
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422312409



##########
File path: docs/streams/developer-guide/config-streams.html
##########
@@ -456,13 +457,22 @@ <h4><a class="toc-backref" href="#id11">num.stream.threads</a><a class="headerli
         <div class="section" id="processing-guarantee">
           <span id="streams-developer-guide-processing-guarantee"></span><h4><a class="toc-backref" href="#id25">processing.guarantee</a><a class="headerlink" href="#processing-guarantee" title="Permalink to this headline"></a></h4>
           <blockquote>
-            <div>The processing guarantee that should be used. Possible values are <code class="docutils literal"><span class="pre">"at_least_once"</span></code> (default) and <code class="docutils literal"><span class="pre">"exactly_once"</span></code>.
-                 Note that if exactly-once processing is enabled, the default for parameter <code class="docutils literal"><span class="pre">commit.interval.ms</span></code> changes to 100ms.
+            <div>The processing guarantee that should be used.
+                 Possible values are <code class="docutils literal"><span class="pre">"at_least_once"</span></code> (default),
+                 <code class="docutils literal"><span class="pre">"exactly_once"</span></code>,
+                 and <code class="docutils literal"><span class="pre">"exactly_once_beta"</span></code>.
+                 Using <code class="docutils literal"><span class="pre">"exactly_once"</span></code> requires broker
+                 version 0.11.0 or newer, while using <code class="docutils literal"><span class="pre">"exactly_once_beta"</span></code>
+                 requires broker version 2.5 or newer.
+                 Note that if exactly-once processing is enabled, the default for parameter
+                 <code class="docutils literal"><span class="pre">commit.interval.ms</span></code> changes to 100ms.
                  Additionally, consumers are configured with <code class="docutils literal"><span class="pre">isolation.level="read_committed"</span></code>
-                 and producers are configured with <code class="docutils literal"><span class="pre">retries=Integer.MAX_VALUE</span></code>, <code class="docutils literal"><span class="pre">enable.idempotence=true</span></code>,
-                 and <code class="docutils literal"><span class="pre">max.in.flight.requests.per.connection=1</span></code> per default.
+                 and producers are configured with <code class="docutils literal"><span class="pre">enable.idempotence=true</span></code> per default.
                  Note that by default exactly-once processing requires a cluster of at least three brokers what is the recommended setting for production.
-                 For development you can change this, by adjusting broker setting <code class="docutils literal"><span class="pre">transaction.state.log.replication.factor</span></code> and <code class="docutils literal"><span class="pre">transaction.state.log.min.isr</span></code> to the number of broker you want to use.
+                 For development you can change this, by adjusting broker setting

Review comment:
       ```suggestion
                    For development, you can change this configuration by adjusting broker setting
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422311215



##########
File path: docs/streams/core-concepts.html
##########
@@ -206,17 +206,26 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
-
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
-        More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
+
+        As of the 2.6.0 release, Kafka Streams supports an improve implementation of exactly-once processing called "exactly-once beta"
+        (requires broker version 2.5.0 or newer).
+        This implementation is more efficient (i.e., less client and broker resource utilization; like client threads, used network connections etc.)
+        and allows for higher throughput and improved scalability.
+        To read more details on how this is done inside the brokers and Kafka Streams, readers are recommended to read
+        <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics">KIP-447</a>.<br />
+
+        In order to achieve exactly-once semantics when running Kafka Streams applications,

Review comment:
       ```suggestion
           To enable exactly-once semantics when running Kafka Streams applications,
   ```

##########
File path: docs/streams/core-concepts.html
##########
@@ -206,17 +206,26 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
-
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
-        More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
+
+        As of the 2.6.0 release, Kafka Streams supports an improve implementation of exactly-once processing called "exactly-once beta"
+        (requires broker version 2.5.0 or newer).
+        This implementation is more efficient (i.e., less client and broker resource utilization; like client threads, used network connections etc.)
+        and allows for higher throughput and improved scalability.
+        To read more details on how this is done inside the brokers and Kafka Streams, readers are recommended to read
+        <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics">KIP-447</a>.<br />
+
+        In order to achieve exactly-once semantics when running Kafka Streams applications,
+        users can simply set the <code>processing.guarantee</code> config value (default value is <b>at_least_once</b>)

Review comment:
       ```suggestion
           set the <code>processing.guarantee</code> config value (default value is <b>at_least_once</b>)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422311625



##########
File path: docs/streams/core-concepts.html
##########
@@ -206,17 +206,26 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
-
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
-        More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
+
+        As of the 2.6.0 release, Kafka Streams supports an improve implementation of exactly-once processing called "exactly-once beta"
+        (requires broker version 2.5.0 or newer).
+        This implementation is more efficient (i.e., less client and broker resource utilization; like client threads, used network connections etc.)
+        and allows for higher throughput and improved scalability.
+        To read more details on how this is done inside the brokers and Kafka Streams, readers are recommended to read
+        <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics">KIP-447</a>.<br />
+
+        In order to achieve exactly-once semantics when running Kafka Streams applications,
+        users can simply set the <code>processing.guarantee</code> config value (default value is <b>at_least_once</b>)
+        to <b>exactly_once</b> (requires brokers version 0.11.0 or newer) or <b>exactly_once_beta</b> (requires brokers version 2.5 or newer).
+        More details can be found in the <a href="/{{version}}/documentation/streams/developer-guide/config-streams.html">Kafka Streams Configs</a> section.

Review comment:
       ```suggestion
           For more information, see the <a href="/{{version}}/documentation/streams/developer-guide/config-streams.html">Kafka Streams Configs</a> section.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422314157



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
+        <code>processing.guarantee</code>) is available.
+        To use this new feature, your brokers must be on version 2.5.x or newer.
+        A switch from <code>"exactly_once"</code> to <code>"exactly_once_beta"</code> (or the other way around) is
+        only possible if the application is on version 2.6.x.
+        Hence, if you want to upgrade your application from an older version and enable this feature,
+        you first need to upgrade your application to version 2.6.x staying on <code>"exactly_once"</code>
+        and afterwards do second round of rolling bounces to switch to <code>"exactly_once_beta"</code>.
+        For a downgrade do the revers: first switch the config from <code>"exactly_once_beta"</code> to
+        <code>"exactly_once"</code>to disable the feature on your 2.6.x application.

Review comment:
       ```suggestion
           <code>"exactly_once"</code> to disable the feature in your 2.6.x application.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422313349



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
+        <code>processing.guarantee</code>) is available.
+        To use this new feature, your brokers must be on version 2.5.x or newer.
+        A switch from <code>"exactly_once"</code> to <code>"exactly_once_beta"</code> (or the other way around) is
+        only possible if the application is on version 2.6.x.
+        Hence, if you want to upgrade your application from an older version and enable this feature,
+        you first need to upgrade your application to version 2.6.x staying on <code>"exactly_once"</code>

Review comment:
       ```suggestion
           you first need to upgrade your application to version 2.6.x, staying on <code>"exactly_once"</code>,
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
mjsax commented on pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#issuecomment-625646077


   Java 8: `org.apache.kafka.streams.integration.QueryableStateIntegrationTest.shouldAllowConcurrentAccesses`
   Java 11:
   `org.apache.kafka.streams.integration.EosBetaUpgradeIntegrationTest.shouldUpgradeFromEosAlphaToEosBeta[true]`
   
   Java 14 passed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422313044



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
+        <code>processing.guarantee</code>) is available.
+        To use this new feature, your brokers must be on version 2.5.x or newer.
+        A switch from <code>"exactly_once"</code> to <code>"exactly_once_beta"</code> (or the other way around) is
+        only possible if the application is on version 2.6.x.
+        Hence, if you want to upgrade your application from an older version and enable this feature,

Review comment:
       ```suggestion
           If you want to upgrade your application from an older version and enable this feature,
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422315269



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -761,16 +784,19 @@ <h3><a id="streams_api_changes_0110" href="#streams_api_changes_0110">Streams AP
 
     <p> Metrics using exactly-once semantics: </p>
     <p>
-        If exactly-once processing is enabled via the <code>processing.guarantees</code> parameter, internally Streams switches from a producer per thread to a producer per task runtime model.
+        If <code>"exactly_once"</code> processing is enabled via the <code>processing.guarantee</code> parameter,
+        internally Streams switches from a producer per thread to a producer per task runtime model

Review comment:
       ```suggestion
           internally Streams switches from a producer-per-thread to a producer-per-task runtime model.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422313531



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -52,6 +52,20 @@ <h1>Upgrade Guide and API Changes</h1>
         <li> restart all new ({{fullDotVersion}}) application instances </li>
     </ul>
 
+    <p>
+        As of Kafka Streams 2.6.x a new processing mode <code>"exactly_once_beta"</code> (configurable via parameter
+        <code>processing.guarantee</code>) is available.
+        To use this new feature, your brokers must be on version 2.5.x or newer.
+        A switch from <code>"exactly_once"</code> to <code>"exactly_once_beta"</code> (or the other way around) is
+        only possible if the application is on version 2.6.x.
+        Hence, if you want to upgrade your application from an older version and enable this feature,
+        you first need to upgrade your application to version 2.6.x staying on <code>"exactly_once"</code>
+        and afterwards do second round of rolling bounces to switch to <code>"exactly_once_beta"</code>.

Review comment:
       ```suggestion
           and then do second round of rolling bounces to switch to <code>"exactly_once_beta"</code>.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #8621: KAFKA-9466: Update Kafka Streams docs for KIP-447

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #8621:
URL: https://github.com/apache/kafka/pull/8621#discussion_r422309232



##########
File path: docs/streams/core-concepts.html
##########
@@ -206,17 +206,26 @@ <h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Pr
         to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
         Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
         In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
-        no duplicates will be generated throughout the pipeline.
+        no duplicates will be generated throughout the pipeline.<br />
 
         Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
         and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
         More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
         Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
         commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
-        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
-
-        In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
-        More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
+        To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.<br />
+
+        As of the 2.6.0 release, Kafka Streams supports an improve implementation of exactly-once processing called "exactly-once beta"
+        (requires broker version 2.5.0 or newer).

Review comment:
       ```suggestion
           which requires broker version 2.5.0 or newer.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org