You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/03/19 13:47:50 UTC

[GitHub] [kafka] cadonna opened a new pull request #10360: KAFKA-12508: Emit records with same value and same timestamp

cadonna opened a new pull request #10360:
URL: https://github.com/apache/kafka/pull/10360


   Emit on change introduced in Streams with KIP-557 might lead to
   data loss if a record is put into a source KTable and emitted
   downstream and then a failure happens before the offset could be
   committed. After Streams rereads the record, it would find a record
   with the same key, value and timestamp in the KTable (i.e. the same
   record that was put into the KTable before the failure) and not
   forward it downstreams. Hence, the record would never be processed
   downstream of the KTable which breaks at-least-once and exactly-once
   processing guarantees.
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ableegoldman commented on pull request #10360: KAFKA-12508: Emit records with same value and same timestamp

Posted by GitBox <gi...@apache.org>.
ableegoldman commented on pull request #10360:
URL: https://github.com/apache/kafka/pull/10360#issuecomment-804340826


   Cherrypicked to 2.8 cc @vvcephei 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cadonna commented on pull request #10360: KAFKA-12508: Emit records with same value and same timestamp

Posted by GitBox <gi...@apache.org>.
cadonna commented on pull request #10360:
URL: https://github.com/apache/kafka/pull/10360#issuecomment-802846262


   Call for review: @vvcephei, @ableegoldman, @mimaison


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ableegoldman commented on pull request #10360: KAFKA-12508: Emit records with same value and same timestamp

Posted by GitBox <gi...@apache.org>.
ableegoldman commented on pull request #10360:
URL: https://github.com/apache/kafka/pull/10360#issuecomment-803176659


   One unrelated test failure: `kafka.server.ListOffsetsRequestTest.testResponseIncludesLeaderEpoch()`
   
   Merging now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ableegoldman commented on pull request #10360: KAFKA-12508: Emit records with same value and same timestamp

Posted by GitBox <gi...@apache.org>.
ableegoldman commented on pull request #10360:
URL: https://github.com/apache/kafka/pull/10360#issuecomment-803189198


   Merged to trunk and cherrypicked back to 2.6 & 2.7. Waiting for @vvcephei to approve this blocker before merging to 2.8
   
   The integration test was relying on the new uncaught exception handler which does not exist prior to 2.8, so I replaced that with just using two StreamThreads. This should have the exact same effect


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ableegoldman commented on a change in pull request #10360: KAFKA-12508: Emit records with same value and same timestamp

Posted by GitBox <gi...@apache.org>.
ableegoldman commented on a change in pull request #10360:
URL: https://github.com/apache/kafka/pull/10360#discussion_r598007541



##########
File path: streams/src/main/java/org/apache/kafka/streams/state/internals/ValueAndTimestampSerializer.java
##########
@@ -105,7 +105,7 @@ public void close() {
     }
 
     private static boolean timeIsDecreasing(final byte[] oldRecord, final byte[] newRecord) {
-        return extractTimestamp(newRecord) < extractTimestamp(oldRecord);
+        return extractTimestamp(newRecord) <= extractTimestamp(oldRecord);

Review comment:
       This is a bit unfortunate since it essentially breaks the emit-on-change semantics 😕  I guess it should be relatively rare for no-op updates to come in with the same timestamp, but this still seems like kind of a structural failure of Kafka Streams. We shouldn't need to assume that if we find an identical record in the state store then we have to forward it on the off-chance it was (1) actually the same record, and (2) was only partially processed when we happened across an unexpected exception.
   
   I'm not saying I have a better idea that could be implemented quickly & safely given the releases we're blocking, but we should at least file a ticket so users are aware of this flaw. Otherwise I imagine we might be getting bug reports that emit-on-change doesn't work. It's possible some of the orthogonal work that's been discussed in the past will end up fixing this on the side (eg buffering updates before commit, versioned tables, etc), whenever we finally get around to any of that




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ableegoldman merged pull request #10360: KAFKA-12508: Emit records with same value and same timestamp

Posted by GitBox <gi...@apache.org>.
ableegoldman merged pull request #10360:
URL: https://github.com/apache/kafka/pull/10360


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org