You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/07/23 03:59:29 UTC

[GitHub] [kafka] ableegoldman opened a new pull request #11114: KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback

ableegoldman opened a new pull request #11114:
URL: https://github.com/apache/kafka/pull/11114


   There were a few followup things to address from [#10926](https://github.com/apache/kafka/pull/10926), most importantly a number of fixes needed for the javadocs. Beyond that it's mostly just adding a few missing verification checks.
   
   Given the whole point of this KIP was to help reduce a major source of confusion, the meaning and usage of grace period within Streams, it's critical that we have clear and correct javadocs accompanying the new APIs. For that reason I think it's very important to get this into 3.0 -- it's also very low-risk, as the only non-docs changes are adding a handful of checks that already exist in the old APIs and were just missed in the new APIs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] kkonstantine commented on pull request #11114: KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback

Posted by GitBox <gi...@apache.org>.
kkonstantine commented on pull request #11114:
URL: https://github.com/apache/kafka/pull/11114#issuecomment-885803101


   Approved for inclusion to 3.0 as well (mentioned on the list too). 
   Thanks @ableegoldman 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ableegoldman commented on pull request #11114: KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback

Posted by GitBox <gi...@apache.org>.
ableegoldman commented on pull request #11114:
URL: https://github.com/apache/kafka/pull/11114#issuecomment-885513066


   > we should update that part (starting in line 138) as well, indicating the new default is now zero. Could you piggy-back that in this PR?
   
   Will do -- thanks for the tip. (Though technically, the new default is not zero -- it's just that there is no default now. You either explicitly choose the API that sets it to zero or else the one that lets you set it yourself)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #11114: KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #11114:
URL: https://github.com/apache/kafka/pull/11114#discussion_r675707618



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -135,25 +135,30 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
     <p>
-        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
-        This period determines how long after a window ends any out-of-order records will still be processed.
-        Records coming in after the grace period has elapsed will be dropped from those windows.
-        With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
-        e.g. suppression operators with the default won't emit results up for 24 hours, while in practice out-of-order data usually has a much smaller time-skewness.
-        Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeAndGrace</code> to let callers always set it upon constructing the windows;
-        the other setters such as <code>TimeWindows#grace</code> are deprecated and will be removed in the future.
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or
+        stream-stream joins. This period determines how long after a window ends any out-of-order records will still
+        be processed. Records coming in after the grace period has elapsed are considered late and will be dropped.
+        But in operators such as suppression, a large grace period has the drawback of incurring an equally large
+        output latency. The current API made it all too easy to miss the grace period config completely, leading you
+        to wonder why your application seems to produce no output -- it actually is, but not for 24 hours.
+        <p>
+        To prevent accidentally or unknowingly falling back to the default 24hr grace period, we deprecated all of the
+        existing static constructors for the <code>indows</code> classes (such as <code>TimeWindows#of</code>). These
+        are replaced by new static constructors of two flavors: <code>#ofSizeAndGrace</code> and <code>#ofSizeWithNoGrace</code>
+        (these are for the <code>TimeWindows</code> class; analogous APIs exist for the <code>JoinWindows</code>,

Review comment:
       For JoinWindows and left/outer join, we also enable the spurious result fix (as pointed out in the next paragraph -- might be good to call out here, too, by referring to the next paragraph).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ableegoldman merged pull request #11114: KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback

Posted by GitBox <gi...@apache.org>.
ableegoldman merged pull request #11114:
URL: https://github.com/apache/kafka/pull/11114


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] guozhangwang commented on a change in pull request #11114: KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback

Posted by GitBox <gi...@apache.org>.
guozhangwang commented on a change in pull request #11114:
URL: https://github.com/apache/kafka/pull/11114#discussion_r675318036



##########
File path: streams/src/main/java/org/apache/kafka/streams/kstream/JoinWindows.java
##########
@@ -102,49 +102,61 @@ private JoinWindows(final long beforeMs,
 
     /**
      * Specifies that records of the same key are joinable if their timestamps are within {@code timeDifference},
-     * i.e., the timestamp of a record from the secondary stream is max {@code timeDifference} earlier or later than
-     * the timestamp of the record from the primary stream. Using the method explicitly sets the grace period to
-     * the duration specified by {@code afterWindowEnd} which means that out of order records arriving
-     * after the window end will be dropped. The delay is defined as (stream_time - record_timestamp).
+     * i.e., the timestamp of a record from the secondary stream is max {@code timeDifference} before or after
+     * the timestamp of the record from the primary stream.
+     * <p>
+     * Using this method explicitly sets the grace period to the duration specified by {@code afterWindowEnd}, which
+     * means that only out-of-order records arriving more than the grace period after the window end will be dropped.
+     * The window close, after which any incoming records are considered late and will be rejected, is defined as
+     * {@code windowEnd + afterWindowEnd}
      *
      * @param timeDifference join window interval
      * @param afterWindowEnd The grace period to admit out-of-order events to a window.
-     * @throws IllegalArgumentException if the {@code afterWindowEnd} is negative of can't be represented as {@code long milliseconds}
      * @return A new JoinWindows object with the specified window definition and grace period
+     * @throws IllegalArgumentException if {@code timeDifference} is negative or can't be represented as {@code long milliseconds}
+     *                                  if the {@code afterWindowEnd} is negative or can't be represented as {@code long milliseconds}
      */
     public static JoinWindows ofTimeDifferenceAndGrace(final Duration timeDifference, final Duration afterWindowEnd) {
-        return new JoinWindows(timeDifference.toMillis(), timeDifference.toMillis(), afterWindowEnd.toMillis(), true);
+        final String timeDifferenceMsgPrefix = prepareMillisCheckFailMsgPrefix(timeDifference, "timeDifference");

Review comment:
       Thanks for the catch!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on a change in pull request #11114: KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback

Posted by GitBox <gi...@apache.org>.
mjsax commented on a change in pull request #11114:
URL: https://github.com/apache/kafka/pull/11114#discussion_r675706665



##########
File path: docs/streams/upgrade-guide.html
##########
@@ -135,25 +135,30 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
         We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
     </p>
     <p>
-        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
-        This period determines how long after a window ends any out-of-order records will still be processed.
-        Records coming in after the grace period has elapsed will be dropped from those windows.
-        With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
-        e.g. suppression operators with the default won't emit results up for 24 hours, while in practice out-of-order data usually has a much smaller time-skewness.
-        Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeAndGrace</code> to let callers always set it upon constructing the windows;
-        the other setters such as <code>TimeWindows#grace</code> are deprecated and will be removed in the future.
+        We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or
+        stream-stream joins. This period determines how long after a window ends any out-of-order records will still
+        be processed. Records coming in after the grace period has elapsed are considered late and will be dropped.
+        But in operators such as suppression, a large grace period has the drawback of incurring an equally large
+        output latency. The current API made it all too easy to miss the grace period config completely, leading you
+        to wonder why your application seems to produce no output -- it actually is, but not for 24 hours.
+        <p>
+        To prevent accidentally or unknowingly falling back to the default 24hr grace period, we deprecated all of the
+        existing static constructors for the <code>indows</code> classes (such as <code>TimeWindows#of</code>). These

Review comment:
       ```suggestion
           existing static constructors for the <code>windows</code> classes (such as <code>TimeWindows#of</code>). These
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] izzyacademy commented on pull request #11114: KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback

Posted by GitBox <gi...@apache.org>.
izzyacademy commented on pull request #11114:
URL: https://github.com/apache/kafka/pull/11114#issuecomment-885903272


   Thanks for the clarification @ableegoldman you have added to the javadocs as well as the code changes
   
   I am glad you were able to work on it in time to make it into the 3.0 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ableegoldman commented on pull request #11114: KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback

Posted by GitBox <gi...@apache.org>.
ableegoldman commented on pull request #11114:
URL: https://github.com/apache/kafka/pull/11114#issuecomment-885958001


   Merged to trunk and cherrypicked to 3.0 (cc @kkonstantine )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org