You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/02/09 20:26:53 UTC

[GitHub] [kafka] MarcoLotz opened a new pull request #10091: KAFKA-9524: increase retention time for window and grace periods longer than one day

MarcoLotz opened a new pull request #10091:
URL: https://github.com/apache/kafka/pull/10091


   **Reproducing:**
   The bug can be easily reproduced for any TimeWindow where window + grace period > 1 day. Changing any test in TimeWindowedKStreamImplTest.java for this condition will reproduce the bug.
   
   **Cause:**
   The root cause is that .grace(...) never updates the default maintainDurationMs field value. The value is thus always 1 day - throwing the IllegalArgumentException when validating it at a later stage.
   
   **Implementation comments:**
   I believe that a minimum retention period of 1 day is desired because of log compaction - thus I used the Math.Max(window+grace, default maintainDurationMs-1 day-) to calculate the minimum retention period.
   
   I am a bit unsure about the test - currently it indirectly tests the bug. The bug would throw a IllegalArgumentException, preventing the test scenario from working with any aggregation (count, reduce, etc). I've implemented the test with count operation to ensure consistency of the window behaviour. If this is not required, maybe just calling processData() and asserting that the original reported exception is not thrown should be enough. 
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vcrfxia commented on a change in pull request #10091: KAFKA-9524: increase retention time for window and grace periods longer than one day

Posted by GitBox <gi...@apache.org>.
vcrfxia commented on a change in pull request #10091:
URL: https://github.com/apache/kafka/pull/10091#discussion_r573909774



##########
File path: streams/src/test/java/org/apache/kafka/streams/kstream/internals/TimeWindowedKStreamImplTest.java
##########
@@ -91,6 +92,47 @@ public void shouldCountWindowed() {
             equalTo(ValueAndTimestamp.make(1L, 500L)));
     }
 
+    @Test
+    public void shouldSupportWindowAndGracePeriodLongerThanDefaultRetentionTime() {

Review comment:
       It's odd to me that test coverage is being added in this file rather than adding unit tests in `TimeWindowsTest` directly. I understand that this is where the original error message was being thrown but it seems it should be the responsibility of `TimeWindows` to properly implement `size()`, `gracePeriodMs()`, and `maintainMs()`, and `TimeWindowedKStreamImpl` should call those directly without needing duplicate tests for different combinations of those values.
   
   WDYT? I'm new to this repo so I defer to @mjsax for more authoritative comments on test philosophy :) 

##########
File path: streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindows.java
##########
@@ -214,7 +216,10 @@ public long gracePeriodMs() {
         // NOTE: in the future, when we remove maintainMs,
         // we should default the grace period to 24h to maintain the default behavior,
         // or we can default to (24h - size) if you want to be super accurate.
-        return graceMs != -1 ? graceMs : maintainMs() - size();
+        if (graceMs != EMPTY_GRACE_PERIOD) {
+            return graceMs;
+        }
+        return maintainDurationMs > sizeMs ? maintainDurationMs - sizeMs : 0;

Review comment:
       nit: how about
   ```suggestion
           return Math.max(maintainDurationMs - sizeMs, 0);
   ```
   for a slight improvement in readability?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vcrfxia commented on pull request #10091: KAFKA-9524: increase retention time for window and grace periods longer than one day

Posted by GitBox <gi...@apache.org>.
vcrfxia commented on pull request #10091:
URL: https://github.com/apache/kafka/pull/10091#issuecomment-776412900


   Rather than having the `grace(...)` method update `maintainDurationMs`, would it be better to update `maintainMs()` to return `Math.max(maintainDurationMs, sizeMs + graceMs)` rather than `Math.max(maintainDurationMs, sizeMs)`?
   https://github.com/apache/kafka/blob/7583e14fb20b34a044d92dcf6b078456bc4f6903/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindows.java#L247-L249
   
   This feels more consistent with the fact that `maintainDurationMs` currently not only does not reflect grace period, but also does not reflect window size.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] MarcoLotz commented on pull request #10091: KAFKA-9524: increase retention time for window and grace periods longer than one day

Posted by GitBox <gi...@apache.org>.
MarcoLotz commented on pull request #10091:
URL: https://github.com/apache/kafka/pull/10091#issuecomment-776643166


   @vcrfxia I see your point. IMHO usually I refrain from having get() and set() (or equivalent methods) doing tasks other than just accessing fields. In the scenario of some extra processing, I usually change the behaviour name to depict that (e.g. getEffectiveMaintainMs)- so that a reader can guess that there's something going on more than accessing just maintainDurationMs fields.
   
   Having said that, this is an original implementer decision and I see that other methods like gracePeriodMs() also perform computations when returning values - so I will just comply with the coding style here.
   
   I saw that there's a kind of magic number on grace periods, that is -1 when not specified. Because of this, the implementation to just 
   ```java
   public long maintainMs() { 
       Math.max(maintainDurationMs, sizeMs + graceMs);
   }
   ```
   would fail for windows bigger than one day and with no grace period, returning the maintainSizeValue of windowsSize - 1 ms instead of windowsSize.
   
   I have updated the PR to follow your suggestions and coping with the problem mentioned above.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] MarcoLotz edited a comment on pull request #10091: KAFKA-9524: increase retention time for window and grace periods longer than one day

Posted by GitBox <gi...@apache.org>.
MarcoLotz edited a comment on pull request #10091:
URL: https://github.com/apache/kafka/pull/10091#issuecomment-776945708


   @vcrfxia Nice one! I performed the suggested the changes. Thanks to your test comments and the fact that now we perform the computation on maintainMs() method, I could change the testing approach to directly test the bug with three lines of code in TimeWindowsTest instead. This preserves the separation of concerns between the tests. Updated the test file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax merged pull request #10091: KAFKA-9524: increase retention time for window and grace periods longer than one day

Posted by GitBox <gi...@apache.org>.
mjsax merged pull request #10091:
URL: https://github.com/apache/kafka/pull/10091


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] MarcoLotz edited a comment on pull request #10091: KAFKA-9524: increase retention time for window and grace periods longer than one day

Posted by GitBox <gi...@apache.org>.
MarcoLotz edited a comment on pull request #10091:
URL: https://github.com/apache/kafka/pull/10091#issuecomment-776945708


   @vcrfxia Nice one! I performed the suggested the changes. Thanks to your test comments and the fact that now we perform the computation on maintainMs() method, I could change the testing approach to directly test the bug with two lines of code in TimeWindowsTest instead. This preserves the separation of concerns between the tests. Updated the test file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] MarcoLotz commented on pull request #10091: KAFKA-9524: increase retention time for window and grace periods longer than one day

Posted by GitBox <gi...@apache.org>.
MarcoLotz commented on pull request #10091:
URL: https://github.com/apache/kafka/pull/10091#issuecomment-776945708


   @vcrfxia Nice one! I performed the suggested the changes. Thanks to your test comments and the fact that now we perform the computation on maintainMs() method, I could change the testing approach to directly test the bug with two lines of code in TimeWindowsTest instead. This conserves the separation of concerns between the tests. Updated the test file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on pull request #10091: KAFKA-9524: increase retention time for window and grace periods longer than one day

Posted by GitBox <gi...@apache.org>.
mjsax commented on pull request #10091:
URL: https://github.com/apache/kafka/pull/10091#issuecomment-781777265


   Thanks for the PR @MarcoLotz! And congrats to your first code contribution!
   
   Thanks for reviewing @vcrfxia!
   
   Merged to `trunk` and cherry-picked to `2.8` branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org