You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/14 01:52:04 UTC

[GitHub] [hudi] wangxianghu commented on a change in pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

wangxianghu commented on a change in pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#discussion_r556990030



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
##########
@@ -165,6 +169,7 @@ public KafkaOffsetGen(TypedProperties props) {
     }
     DataSourceUtils.checkRequiredProperties(props, Collections.singletonList(Config.KAFKA_TOPIC_NAME));
     topicName = props.getString(Config.KAFKA_TOPIC_NAME);
+    kafkaCheckpointTimestamp = props.getString(Config.KAFKA_CHECKPOINT_TIMESTAMP);

Review comment:
       if the value of `Config.KAFKA_CHECKPOINT_TIMESTAMP`  does not exist, Exception will be thrown, this is not expected when the user want to use checkpoint by providing offsets

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
##########
@@ -182,6 +187,10 @@ public KafkaOffsetGen(TypedProperties props) {
               .map(x -> new TopicPartition(x.topic(), x.partition())).collect(Collectors.toSet());
 
       // Determine the offset ranges to read from
+      if (kafkaCheckpointTimestamp != null) {
+        lastCheckpointStr = Option.of(getOffsetsByTimestamp(consumer, partitionInfoList, topicName, Long.parseLong(kafkaCheckpointTimestamp)));
+      }
+
       if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) {

Review comment:
       Here we can not simply over write `lastCheckpointStr`. if user configed `Config.KAFKA_CHECKPOINT_TIMESTAMP`,  hudi will always consume from `Config.KAFKA_CHECKPOINT_TIMESTAMP` and can not moving on, right ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org