You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/07/08 20:28:34 UTC

[GitHub] [kafka] vvcephei opened a new pull request #11003: KAFKA-12360: Document new time semantics

vvcephei opened a new pull request #11003:
URL: https://github.com/apache/kafka/pull/11003


   Update the docs for task idling, since the semantics have
   changed in 3.0.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei commented on pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
vvcephei commented on pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#issuecomment-878602756


   Merged and cherry-picked to 3.0 (cc @kkonstantine )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#discussion_r666502845



##########
File path: docs/streams/developer-guide/config-streams.html
##########
@@ -600,8 +613,34 @@ <h4><a class="toc-backref" href="#id33">default.windowed.value.serde.inner</a><a
           <span id="streams-developer-guide-max-task-idle-ms"></span><h4><a class="toc-backref" href="#id28">max.task.idle.ms</a><a class="headerlink" href="#max-task-idle-ms" title="Permalink to this headline"></a></h4>
           <blockquote>
             <div>
-              The maximum amount of time a task will idle without processing data when waiting for all of its input partition buffers to contain records. This can help avoid potential out-of-order
-              processing when the task has multiple input streams, as in a join, for example. Setting this to a nonzero value may increase latency but will improve time synchronization.
+              <p>
+                In a task with multiple input partitions (as in a join or merge), this is the amount
+                of time to block processing to wait for new data to be produced to a partition we

Review comment:
       Maybe, "to a partition that is caught up"? (to remove "we" from the sentence)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] showuon commented on a change in pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
showuon commented on a change in pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#discussion_r666903566



##########
File path: docs/streams/developer-guide/config-streams.html
##########
@@ -600,8 +613,54 @@ <h4><a class="toc-backref" href="#id33">default.windowed.value.serde.inner</a><a
           <span id="streams-developer-guide-max-task-idle-ms"></span><h4><a class="toc-backref" href="#id28">max.task.idle.ms</a><a class="headerlink" href="#max-task-idle-ms" title="Permalink to this headline"></a></h4>
           <blockquote>
             <div>
-              The maximum amount of time a task will idle without processing data when waiting for all of its input partition buffers to contain records. This can help avoid potential out-of-order
-              processing when the task has multiple input streams, as in a join, for example. Setting this to a nonzero value may increase latency but will improve time synchronization.
+              <p>
+                This configuration controls how long Streams will wait to fetch data in order to
+                provide in-order processing semantics.
+              </p>
+              <p>
+                When processing a task has multiple input partitions (as in a join or merge),

Review comment:
       When processing a task **has** multiple input partitions -> When processing a task **having** multiple input partitions
   Is that right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei commented on pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
vvcephei commented on pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#issuecomment-878599816


   Thanks, @JimGalasyn , @showuon , and @abbccdda !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#discussion_r666503902



##########
File path: streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java
##########
@@ -427,8 +427,18 @@
 
     /** {@code max.task.idle.ms} */
     public static final String MAX_TASK_IDLE_MS_CONFIG = "max.task.idle.ms";
-    private static final String MAX_TASK_IDLE_MS_DOC = "Maximum amount of time in milliseconds a stream task will stay idle when not all of its partition buffers contain records," +
-        " to avoid potential out-of-order record processing across multiple input streams.";
+    private static final String MAX_TASK_IDLE_MS_DOC = "This config controls whether joins and merges"
+        + " may produce out-of-order results."
+        + " The config value is the maximum amount of time in milliseconds a stream task will stay idle"
+        + " when it is fully caught up on some (but not all) input partitions"
+        + " to wait for producers to send additional records and avoid potential"
+        + " out-of-order record processing across multiple input streams."
+        + " The default (zero) does not wait for producers to send more records,"
+        + " but it does wait to fetch data that is already present on the brokers."
+        + " This default means that for records that are already present on the brokers,"
+        + " Streams will process them in timestamp order."
+        + " The setting of -1 disables idling entirely and processes any locally available data,"

Review comment:
       ```suggestion
           + " Set to -1 to disable idling entirely and process any locally available data,"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei merged pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
vvcephei merged pull request #11003:
URL: https://github.com/apache/kafka/pull/11003


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei commented on pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
vvcephei commented on pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#issuecomment-876743838


   Thanks for the review, @JimGalasyn ! I've just reworded the main documentation section in response to your feedback. I also included a few other improvements I noticed on the second pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei commented on a change in pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
vvcephei commented on a change in pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#discussion_r666504937



##########
File path: streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java
##########
@@ -427,8 +427,18 @@
 
     /** {@code max.task.idle.ms} */
     public static final String MAX_TASK_IDLE_MS_CONFIG = "max.task.idle.ms";
-    private static final String MAX_TASK_IDLE_MS_DOC = "Maximum amount of time in milliseconds a stream task will stay idle when not all of its partition buffers contain records," +
-        " to avoid potential out-of-order record processing across multiple input streams.";
+    private static final String MAX_TASK_IDLE_MS_DOC = "This config controls whether joins and merges"
+        + " may produce out-of-order results."

Review comment:
       Yep. This is just string concatenation, so we need to add a space between "merges" and "may". I could have put them at the end, but the beginning makes it easier to spot if we're missing one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei commented on pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
vvcephei commented on pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#issuecomment-876730215


   Note that I'll need to cherry-pick this PR to AK 3.0 once it's merged (cc @kkonstantine )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei commented on pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
vvcephei commented on pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#issuecomment-876725691


   Here are the rendered docs (just the sections I changed):
   ![task-idling-1](https://user-images.githubusercontent.com/832787/124986451-3a24a000-e001-11eb-9547-3241a177366a.png)
   ![task-idling-2](https://user-images.githubusercontent.com/832787/124986474-3f81ea80-e001-11eb-8ae0-8986d98e93be.png)
   ![task-idling-3_000](https://user-images.githubusercontent.com/832787/124986480-41e44480-e001-11eb-8816-4a28bf7552e0.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#discussion_r666504127



##########
File path: streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java
##########
@@ -427,8 +427,18 @@
 
     /** {@code max.task.idle.ms} */
     public static final String MAX_TASK_IDLE_MS_CONFIG = "max.task.idle.ms";
-    private static final String MAX_TASK_IDLE_MS_DOC = "Maximum amount of time in milliseconds a stream task will stay idle when not all of its partition buffers contain records," +
-        " to avoid potential out-of-order record processing across multiple input streams.";
+    private static final String MAX_TASK_IDLE_MS_DOC = "This config controls whether joins and merges"
+        + " may produce out-of-order results."

Review comment:
       Are these leading spaces epected?

##########
File path: streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java
##########
@@ -427,8 +427,18 @@
 
     /** {@code max.task.idle.ms} */
     public static final String MAX_TASK_IDLE_MS_CONFIG = "max.task.idle.ms";
-    private static final String MAX_TASK_IDLE_MS_DOC = "Maximum amount of time in milliseconds a stream task will stay idle when not all of its partition buffers contain records," +
-        " to avoid potential out-of-order record processing across multiple input streams.";
+    private static final String MAX_TASK_IDLE_MS_DOC = "This config controls whether joins and merges"
+        + " may produce out-of-order results."

Review comment:
       Are these leading spaces expected?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei commented on a change in pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
vvcephei commented on a change in pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#discussion_r666503862



##########
File path: docs/streams/developer-guide/config-streams.html
##########
@@ -600,8 +613,34 @@ <h4><a class="toc-backref" href="#id33">default.windowed.value.serde.inner</a><a
           <span id="streams-developer-guide-max-task-idle-ms"></span><h4><a class="toc-backref" href="#id28">max.task.idle.ms</a><a class="headerlink" href="#max-task-idle-ms" title="Permalink to this headline"></a></h4>
           <blockquote>
             <div>
-              The maximum amount of time a task will idle without processing data when waiting for all of its input partition buffers to contain records. This can help avoid potential out-of-order
-              processing when the task has multiple input streams, as in a join, for example. Setting this to a nonzero value may increase latency but will improve time synchronization.
+              <p>
+                In a task with multiple input partitions (as in a join or merge), this is the amount
+                of time to block processing to wait for new data to be produced to a partition we

Review comment:
       I see what you mean. I also just noticed I've stacked up four prepositional phrases (!!!). I'll re-work it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei commented on a change in pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
vvcephei commented on a change in pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#discussion_r666502523



##########
File path: docs/streams/developer-guide/config-streams.html
##########
@@ -241,8 +241,21 @@ <h4><a class="toc-backref" href="#id5">bootstrap.servers</a><a class="headerlink
           </tr>
           <tr class="row-even"><td>max.task.idle.ms</td>
             <td>Medium</td>
-            <td colspan="2">Maximum amount of time in milliseconds a stream task will stay idle while waiting for all partitions to contain data
-              and avoid potential out-of-order record processing across multiple input streams.</td>
+            <td colspan="2">
+              <p>
+                This config controls whether joins and merges may produce out-of-order results.
+                The config value is the maximum amount of time in milliseconds a stream task will stay idle
+                when it is fully caught up on some (but not all) input partitions
+                to wait for producers to send additional records and avoid potential
+                out-of-order record processing across multiple input streams.
+                The default (zero) does not wait for producers to send more records,
+                but it does wait to fetch data that is already present on the brokers.
+                This default means that for records that are already present on the brokers,
+                Streams will process them in timestamp order.
+                The setting of -1 disables idling entirely and processes any locally available data,

Review comment:
       Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei commented on a change in pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
vvcephei commented on a change in pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#discussion_r667228608



##########
File path: docs/streams/developer-guide/config-streams.html
##########
@@ -600,8 +613,54 @@ <h4><a class="toc-backref" href="#id33">default.windowed.value.serde.inner</a><a
           <span id="streams-developer-guide-max-task-idle-ms"></span><h4><a class="toc-backref" href="#id28">max.task.idle.ms</a><a class="headerlink" href="#max-task-idle-ms" title="Permalink to this headline"></a></h4>
           <blockquote>
             <div>
-              The maximum amount of time a task will idle without processing data when waiting for all of its input partition buffers to contain records. This can help avoid potential out-of-order
-              processing when the task has multiple input streams, as in a join, for example. Setting this to a nonzero value may increase latency but will improve time synchronization.
+              <p>
+                This configuration controls how long Streams will wait to fetch data in order to
+                provide in-order processing semantics.
+              </p>
+              <p>
+                When processing a task has multiple input partitions (as in a join or merge),

Review comment:
       Ah, good catch. This was a casualty of refactoring. This is what I meant to say:
   
   ```suggestion
                   When processing a task that has multiple input partitions (as in a join or merge),
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] JimGalasyn commented on a change in pull request #11003: KAFKA-12360: Document new time semantics

Posted by GitBox <gi...@apache.org>.
JimGalasyn commented on a change in pull request #11003:
URL: https://github.com/apache/kafka/pull/11003#discussion_r666502139



##########
File path: docs/streams/developer-guide/config-streams.html
##########
@@ -241,8 +241,21 @@ <h4><a class="toc-backref" href="#id5">bootstrap.servers</a><a class="headerlink
           </tr>
           <tr class="row-even"><td>max.task.idle.ms</td>
             <td>Medium</td>
-            <td colspan="2">Maximum amount of time in milliseconds a stream task will stay idle while waiting for all partitions to contain data
-              and avoid potential out-of-order record processing across multiple input streams.</td>
+            <td colspan="2">
+              <p>
+                This config controls whether joins and merges may produce out-of-order results.
+                The config value is the maximum amount of time in milliseconds a stream task will stay idle
+                when it is fully caught up on some (but not all) input partitions
+                to wait for producers to send additional records and avoid potential
+                out-of-order record processing across multiple input streams.
+                The default (zero) does not wait for producers to send more records,
+                but it does wait to fetch data that is already present on the brokers.
+                This default means that for records that are already present on the brokers,
+                Streams will process them in timestamp order.
+                The setting of -1 disables idling entirely and processes any locally available data,

Review comment:
       ```suggestion
                   Set to -1 to disable idling entirely and process any locally available data,
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org