You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/06/29 21:06:06 UTC

[GitHub] [kafka] guozhangwang opened a new pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

guozhangwang opened a new pull request #10941:
URL: https://github.com/apache/kafka/pull/10941


   Also update the upgrade guide indicating about the grace period KIP and its indication on the fix with throughput impact.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] guozhangwang commented on a change in pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
guozhangwang commented on a change in pull request #10941:
URL: https://github.com/apache/kafka/pull/10941#discussion_r670119642



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
     <p>
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
+    <p>
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
+        This period determines how long after a window ends any late arrived records will still be processed.
+        Records coming in after the grace period has elapsed will be dropped from those windows.
+        With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
+        e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
+        Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows;

Review comment:
       `TimeDiffernce` is only in `JoinWindows`.

##########
File path: docs/streams/upgrade-guide.html
##########
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
     <p>
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
+    <p>
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.

Review comment:
       Since we are piggy-backing the fix on KIP-663 now, I want to incorporate the change along with this PR.

##########
File path: docs/streams/upgrade-guide.html
##########
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
     <p>
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
+    <p>
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
+        This period determines how long after a window ends any late arrived records will still be processed.
+        Records coming in after the grace period has elapsed will be dropped from those windows.
+        With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
+        e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
+        Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows;
+        the other setters such as <code>TimeWindows#grace</code> are deprecated and will be removed in the future.
+        Also when the new construct API are used for left/outer stream-stream joins, Kafka Streams would fix emitting spurious join results which may have an impact on the throughput.

Review comment:
       Ack!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #10941:
URL: https://github.com/apache/kafka/pull/10941#discussion_r661976703



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
     <p>
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
+    <p>
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
+        This period determines how long after a window ends any late arrived records will still be processed.
+        Records coming in after the grace period has elapsed will be dropped from those windows.
+        With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
+        e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
+        Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows;

Review comment:
       It's also `TimeDiffernce`, not `Size`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #10941:
URL: https://github.com/apache/kafka/pull/10941#discussion_r661973607



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
     <p>
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
+    <p>
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
+        This period determines how long after a window ends any late arrived records will still be processed.

Review comment:
       `late` -> `out-of-order`
   
   Record are "late" by definition if they are dropped because they arrive after the grace period passed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] guozhangwang merged pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
guozhangwang merged pull request #10941:
URL: https://github.com/apache/kafka/pull/10941


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] guozhangwang commented on pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
guozhangwang commented on pull request #10941:
URL: https://github.com/apache/kafka/pull/10941#issuecomment-870917133


   ping @mjsax @spena .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #10941:
URL: https://github.com/apache/kafka/pull/10941#discussion_r661977132



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
     <p>
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
+    <p>
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
+        This period determines how long after a window ends any late arrived records will still be processed.
+        Records coming in after the grace period has elapsed will be dropped from those windows.
+        With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
+        e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
+        Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows;
+        the other setters such as <code>TimeWindows#grace</code> are deprecated and will be removed in the future.
+        Also when the new construct API are used for left/outer stream-stream joins, Kafka Streams would fix emitting spurious join results which may have an impact on the throughput.

Review comment:
       `would fix emitting spurious join results` -- not sure if users would understand this.
   
   Maybe better:
   ```
   In older versions, Kafka Streams emitted stream-stream left/outer join results eagerly.
   This behavior may lead to spurious left/outer join result records.
   In this release, we changed the behavior to avoid spurious results and
   left/outer join result are only emitted after the join window is closed,
   i.e., after the grace period elapsed. To maintain backward compatibility,
   the old API <code>JoinWindows.of(size)</code> preserves the old eager-emit behavior
   and only the new API `<code>JoinWindows.ofTimeDifferenceAndGrace()</code>
   (and <code>JoinsWindows#ofTimeDifferenceNoGrace</code>) enable the new behavior.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] showuon commented on a change in pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
showuon commented on a change in pull request #10941:
URL: https://github.com/apache/kafka/pull/10941#discussion_r661329233



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
     <p>
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
+    <p>
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
+        This period determines how long after a window ends any late arrived records will still be processed.
+        Records coming in after the grace period has elapsed will be dropped from those windows.
+        With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
+        e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
+        Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows;

Review comment:
       I think you're referring `TimeWindows#ofSizeAndGrace`, so that user can set the grace value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #10941:
URL: https://github.com/apache/kafka/pull/10941#discussion_r661975340



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
     <p>
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
+    <p>
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.

Review comment:
       Why do we include this in this PR? Seems it should be part of KIP-663 doc updates? \cc @izzyacademy 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] guozhangwang commented on pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
guozhangwang commented on pull request #10941:
URL: https://github.com/apache/kafka/pull/10941#issuecomment-880901349


   Merged to trunk; cherry-picked to 3.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #10941:
URL: https://github.com/apache/kafka/pull/10941#discussion_r670688025



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
     <p>
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
+    <p>
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
+        This period determines how long after a window ends any late arrived records will still be processed.
+        Records coming in after the grace period has elapsed will be dropped from those windows.
+        With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
+        e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
+        Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows;

Review comment:
       In the old API,  yes, but KIP-633 change it from `size` to `timeDifference` (https://cwiki.apache.org/confluence/display/KAFKA/KIP-633%3A+Drop+24+hour+default+of+grace+period+in+Streams)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #10941: KAFKA-10847: Remove internal config for enabling the fix

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #10941:
URL: https://github.com/apache/kafka/pull/10941#discussion_r661974362



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
     <p>
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
+    <p>
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
+        This period determines how long after a window ends any late arrived records will still be processed.
+        Records coming in after the grace period has elapsed will be dropped from those windows.
+        With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
+        e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.

Review comment:
       `lso` ? Should it be `also`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org