You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/22 14:51:59 UTC

[GitHub] [pinot] sajjad-moradi commented on a diff in pull request #8321: Handle out of range in KafkaConsumer

sajjad-moradi commented on code in PR #8321:
URL: https://github.com/apache/pinot/pull/8321#discussion_r903853686


##########
pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/KafkaPartitionLevelConsumer.java:
##########
@@ -55,7 +58,12 @@ public MessageBatch<byte[]> fetchMessages(long startOffset, long endOffset, int
       LOGGER.debug("poll consumer: {}, startOffset: {}, endOffset:{} timeout: {}ms", _topicPartition, startOffset,
           endOffset, timeoutMillis);
     }
-    _consumer.seek(_topicPartition, startOffset);
+    Map<TopicPartition, Long> beginningOffsets = _consumer.beginningOffsets(Lists.newArrayList(_topicPartition));
+    Long beginningOffset = beginningOffsets.values().iterator().next();
+    // explicitly check for OutOfRange, where startOffset < beginningOffset
+    // without this, _consumer.poll will auto offset reset to latest, resulting in data loss
+    _consumer.seek(_topicPartition, Math.max(startOffset, beginningOffset));

Review Comment:
   We're adding one more call to kafka in the execution path for all happy cases to fix a rare edge case. If Kafka consumer doesn't throw exception for out of order scenario, maybe we should check the fetched messages and in case there's no message, then we can get the beginning offset; seek to it; and then fetch again?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org