You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/09 16:44:06 UTC

[GitHub] [hudi] Litianye opened a new pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Litianye opened a new pull request #1719:
URL: https://github.com/apache/hudi/pull/1719


   ## What is the purpose of the pull request
   This pull request fix deltastreamer use kafkasource (such as JsonKafkaSource / AvroKafkaSource)  with offset reset strategy:latest can't consume data because the checkpoint string store in .commit file .commit file will always be an empty string.
   
   For example, i want to inject data from kafka into a new hudi table. 
   From `org.apache.hudi.utilities.deltastreamer.DeltaSync#readFromSource`, the first time consume `resumeCheckpointStr` will be `Option.empty()`, and the `lastCkptStr` used in `org.apache.hudi.utilities.sources.Source#fetchNewData` will also be `Option.empty()`.
   Fetch new data code like this:
   `protected InputBatch<JavaRDD<GenericRecord>> fetchNewData(Option<String> lastCheckpointStr, long sourceLimit) {
       OffsetRange[] offsetRanges = offsetGen.getNextOffsetRanges(lastCheckpointStr, sourceLimit);
       long totalNewMsgs = CheckpointUtils.totalNewMessages(offsetRanges);
       if (totalNewMsgs <= 0) {
         return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? lastCheckpointStr.get() : "");
       } else {
         LOG.info("About to read " + totalNewMsgs + " from Kafka for topic :" + offsetGen.getTopicName());
       }
       JavaRDD<GenericRecord> newDataRDD = toRDD(offsetRanges);
       return new InputBatch<>(Option.of(newDataRDD), KafkaOffsetGen.CheckpointUtils.offsetsToStr(offsetRanges));
     }`
   
   When get next offset ranges in `org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen#getNextOffsetRanges` code like this:
   `// Determine the offset ranges to read from
         if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) {
           fromOffsets = checkupValidOffsets(consumer, lastCheckpointStr, topicPartitions);
         } else {
           KafkaResetOffsetStrategies autoResetValue = KafkaResetOffsetStrategies
                   .valueOf(props.getString("auto.offset.reset", Config.DEFAULT_AUTO_RESET_OFFSET.toString()).toUpperCase());
           switch (autoResetValue) {
             case EARLIEST:
               fromOffsets = consumer.beginningOffsets(topicPartitions);
               break;
             case LATEST:
               fromOffsets = consumer.endOffsets(topicPartitions);
               break;
             default:
               throw new HoodieNotSupportedException("Auto reset value must be one of 'earliest' or 'latest' ");
           }
         }
   
         // Obtain the latest offsets.
         toOffsets = consumer.endOffsets(topicPartitions);`
   
   Because `lastCkptStr` is `Option.empty()`, fromOffsets and toOffsets all will be consumer's endOffsets, `totalNewMsgs` size is 0 and first time checkpoint string return value is an empty string. Next consume operation will get this empty string checkpoint, in `KafkaOffsetGen` offset range will always be handled to reset as latest and return another empty string checkpoint.
   
   By watching, checkpoint will be normal only if kafka latest offset change between `fromOffsets` and `toOffsets` get end offset value.
   
   ## Brief change log
   - Modify checkpoint string return value of JsonKafkaSource & AvroKafkaSource method fetchNewData(), when offsetRanges total message count <= 0.
   
   ## Verify this pull request
   This pull request is already covered by existing tests, such as : 
   Run TestKafkaSource
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Litianye commented on a change in pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
Litianye commented on a change in pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#discussion_r438058262



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java
##########
@@ -57,10 +57,10 @@ public AvroKafkaSource(TypedProperties props, JavaSparkContext sparkContext, Spa
   protected InputBatch<JavaRDD<GenericRecord>> fetchNewData(Option<String> lastCheckpointStr, long sourceLimit) {
     OffsetRange[] offsetRanges = offsetGen.getNextOffsetRanges(lastCheckpointStr, sourceLimit);
     long totalNewMsgs = CheckpointUtils.totalNewMessages(offsetRanges);
+    LOG.info("About to read " + totalNewMsgs + " from Kafka for topic :" + offsetGen.getTopicName());
     if (totalNewMsgs <= 0) {
-      return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? lastCheckpointStr.get() : "");
-    } else {
-      LOG.info("About to read " + totalNewMsgs + " from Kafka for topic :" + offsetGen.getTopicName());
+      return new InputBatch<>(Option.empty(),
+              lastCheckpointStr.isPresent() ? lastCheckpointStr.get() : CheckpointUtils.offsetsToStr(offsetRanges));

Review comment:
       For a hudi table already has a `""` checkpoint, `lastCheckpointStr` can be `""` here, so i change return value to return new endOffsets checkpoint string in all case.
   And i add a simple test case for kafka latest offset reset strategy in [hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestKafkaSource.java](url).
   Thanks for review! :handshake:




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-643077420


   for example, https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java#L58


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Litianye commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
Litianye commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-643736057


   > > @Litianye Thanks for digging into it. Great work!
   > > @leesf LGTM, please take another pass.
   > 
   > @Litianye LGTM overall, only left one minor comment.
   
   thx for review :handshake: , i have merged  two methods


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-641222330


   @garyli1019 would you please review this one?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on a change in pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#discussion_r437841744



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java
##########
@@ -57,10 +57,10 @@ public AvroKafkaSource(TypedProperties props, JavaSparkContext sparkContext, Spa
   protected InputBatch<JavaRDD<GenericRecord>> fetchNewData(Option<String> lastCheckpointStr, long sourceLimit) {
     OffsetRange[] offsetRanges = offsetGen.getNextOffsetRanges(lastCheckpointStr, sourceLimit);
     long totalNewMsgs = CheckpointUtils.totalNewMessages(offsetRanges);
+    LOG.info("About to read " + totalNewMsgs + " from Kafka for topic :" + offsetGen.getTopicName());
     if (totalNewMsgs <= 0) {
-      return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? lastCheckpointStr.get() : "");

Review comment:
       hmmm interesting... so right now if we use `LATEST` as reset key, then we will fall into a dead loop unless we are lucky enough to have message fall in between two `consumer.endOffsets(topicPartitions)` call.

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java
##########
@@ -57,10 +57,10 @@ public AvroKafkaSource(TypedProperties props, JavaSparkContext sparkContext, Spa
   protected InputBatch<JavaRDD<GenericRecord>> fetchNewData(Option<String> lastCheckpointStr, long sourceLimit) {
     OffsetRange[] offsetRanges = offsetGen.getNextOffsetRanges(lastCheckpointStr, sourceLimit);
     long totalNewMsgs = CheckpointUtils.totalNewMessages(offsetRanges);
+    LOG.info("About to read " + totalNewMsgs + " from Kafka for topic :" + offsetGen.getTopicName());
     if (totalNewMsgs <= 0) {
-      return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? lastCheckpointStr.get() : "");
-    } else {
-      LOG.info("About to read " + totalNewMsgs + " from Kafka for topic :" + offsetGen.getTopicName());
+      return new InputBatch<>(Option.empty(),
+              lastCheckpointStr.isPresent() ? lastCheckpointStr.get() : CheckpointUtils.offsetsToStr(offsetRanges));

Review comment:
       could `lastCheckpointStr` be `""` here? 
   Also, can we add a test for this case?
   https://github.com/apache/hudi/blob/master/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestKafkaSource.java#L107




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-642427774


   @Litianye no worry we can work through this together. I believe all the sources treat empty string and `Option.empty()` the same. If not then it's a bug. If we don't fix it now, the empty string will haunt us later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-641096930


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=h1) Report
   > Merging [#1719](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/9e07cebece3b4c8b964ddca2f40053734a392ce2&el=desc) will **decrease** coverage by `0.03%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/1719/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1719      +/-   ##
   ============================================
   - Coverage     18.17%   18.14%   -0.04%     
   - Complexity      857      859       +2     
   ============================================
     Files           348      352       +4     
     Lines         15369    15411      +42     
     Branches       1525     1525              
   ============================================
   + Hits           2794     2797       +3     
   - Misses        12217    12256      +39     
     Partials        358      358              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...org/apache/hudi/config/HoodieCompactionConfig.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUNvbXBhY3Rpb25Db25maWcuamF2YQ==) | `55.33% <0.00%> (-0.67%)` | `3.00% <0.00%> (ø%)` | |
   | [...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...he/hudi/table/action/commit/UpsertPartitioner.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9VcHNlcnRQYXJ0aXRpb25lci5qYXZh) | `55.39% <0.00%> (ø)` | `15.00% <0.00%> (ø%)` | |
   | [.../hadoop/realtime/AbstractRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0Fic3RyYWN0UmVhbHRpbWVSZWNvcmRSZWFkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [.../hadoop/realtime/RealtimeUnmergedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lVW5tZXJnZWRSZWNvcmRSZWFkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...hadoop/realtime/RealtimeCompactedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lQ29tcGFjdGVkUmVjb3JkUmVhZGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...oop/realtime/HoodieParquetRealtimeInputFormat.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVBhcnF1ZXRSZWFsdGltZUlucHV0Rm9ybWF0LmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...op/realtime/HoodieCombineRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZUNvbWJpbmVSZWFsdGltZVJlY29yZFJlYWRlci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...on/rollback/MergeOnReadRollbackActionExecutor.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL3JvbGxiYWNrL01lcmdlT25SZWFkUm9sbGJhY2tBY3Rpb25FeGVjdXRvci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | ... and [8 more](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=footer). Last update [9e07ceb...2cc2375](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-641096930


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=h1) Report
   > Merging [#1719](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/9e07cebece3b4c8b964ddca2f40053734a392ce2&el=desc) will **decrease** coverage by `0.03%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/1719/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1719      +/-   ##
   ============================================
   - Coverage     18.17%   18.14%   -0.04%     
   - Complexity      857      859       +2     
   ============================================
     Files           348      352       +4     
     Lines         15369    15411      +42     
     Branches       1525     1525              
   ============================================
   + Hits           2794     2797       +3     
   - Misses        12217    12256      +39     
     Partials        358      358              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...org/apache/hudi/config/HoodieCompactionConfig.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUNvbXBhY3Rpb25Db25maWcuamF2YQ==) | `55.33% <0.00%> (-0.67%)` | `3.00% <0.00%> (ø%)` | |
   | [...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...he/hudi/table/action/commit/UpsertPartitioner.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9VcHNlcnRQYXJ0aXRpb25lci5qYXZh) | `55.39% <0.00%> (ø)` | `15.00% <0.00%> (ø%)` | |
   | [.../hadoop/realtime/AbstractRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0Fic3RyYWN0UmVhbHRpbWVSZWNvcmRSZWFkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [.../hadoop/realtime/RealtimeUnmergedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lVW5tZXJnZWRSZWNvcmRSZWFkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...hadoop/realtime/RealtimeCompactedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lQ29tcGFjdGVkUmVjb3JkUmVhZGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...oop/realtime/HoodieParquetRealtimeInputFormat.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVBhcnF1ZXRSZWFsdGltZUlucHV0Rm9ybWF0LmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...op/realtime/HoodieCombineRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZUNvbWJpbmVSZWFsdGltZVJlY29yZFJlYWRlci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...on/rollback/MergeOnReadRollbackActionExecutor.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL3JvbGxiYWNrL01lcmdlT25SZWFkUm9sbGJhY2tBY3Rpb25FeGVjdXRvci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | ... and [8 more](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=footer). Last update [9e07ceb...4628752](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf merged pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
leesf merged pull request #1719:
URL: https://github.com/apache/hudi/pull/1719


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Litianye commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
Litianye commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-642388711


   > I think we can replace empty string by `Option.empty()` https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L261
   > so we only needs to handle one case later. This will make the logic more straight forward. WDYT?
   
   It's a more logical solution, move all handle of checkpoint string into DeltaSync. Commit related change in this issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-643594937


   > @Litianye Thanks for digging into it. Great work!
   > @leesf LGTM, please take another pass.
   
   @Litianye LGTM overall, only left one minor comment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-641096930


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=h1) Report
   > Merging [#1719](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/9e07cebece3b4c8b964ddca2f40053734a392ce2&el=desc) will **increase** coverage by `0.03%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/1719/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1719      +/-   ##
   ============================================
   + Coverage     18.17%   18.21%   +0.03%     
   - Complexity      857      859       +2     
   ============================================
     Files           348      348              
     Lines         15369    15356      -13     
     Branches       1525     1523       -2     
   ============================================
   + Hits           2794     2797       +3     
   + Misses        12217    12201      -16     
     Partials        358      358              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...org/apache/hudi/config/HoodieCompactionConfig.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUNvbXBhY3Rpb25Db25maWcuamF2YQ==) | `55.33% <0.00%> (-0.67%)` | `3.00% <0.00%> (ø%)` | |
   | [...he/hudi/table/action/commit/UpsertPartitioner.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9VcHNlcnRQYXJ0aXRpb25lci5qYXZh) | `55.39% <0.00%> (ø)` | `15.00% <0.00%> (ø%)` | |
   | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=) | `40.71% <0.00%> (+0.63%)` | `50.00% <0.00%> (+2.00%)` | |
   | [.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh) | `7.14% <0.00%> (+1.39%)` | `4.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=footer). Last update [9e07ceb...565f9b4](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Litianye commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
Litianye commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-643142558


   > > In `KafkaSource` like `JsonKafkaSource` & `AvroKafkaSource`, `Option.empty()`and `Option.of("")` treated in reset strategy branch
   > 
   > Can we remove the empty string handling in the source and elsewhere as well? It might confuse other people when they are reading the code there.
   > > https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HiveIncrPullSource.java#L123
   > > 
   > > will return an empty string, can you assess this and how can we fix it?
   > 
   > IMO this is fine, we need to store the empty string in some cases. As long as we can eliminate empty string during the offset calculation, the code should look cleaner.
   
   hmmm,if empty string is fine in `HiveIncrPullSource`, it's easier to achieve the goal:
   >I believe all the sources treat empty string and `Option.empty()` the same.
   
   a little doubt is we will remove all of empty string check in `Source`, or, just treat empty string and `Option.empty()` the same. 
   
   I remove the empty string check in line https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java#L58
   This check is unnecessary in this method.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-641539959


   @Litianye Thanks for making this PR. Will review soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#discussion_r439723271



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestKafkaSource.java
##########
@@ -93,6 +93,18 @@ private TypedProperties createPropsForJsonSource(Long maxEventsToReadFromKafkaSo
     return props;
   }
 
+  private TypedProperties createLatestPropsForJsonSource(Long maxEventsToReadFromKafkaSource) {
+    TypedProperties props = new TypedProperties();
+    props.setProperty("hoodie.deltastreamer.source.kafka.topic", TEST_TOPIC_NAME);
+    props.setProperty("bootstrap.servers", testUtils.brokerAddress());
+    props.setProperty("auto.offset.reset", "latest");
+    props.setProperty("hoodie.deltastreamer.kafka.source.maxEvents",
+            maxEventsToReadFromKafkaSource != null ? String.valueOf(maxEventsToReadFromKafkaSource) :
+                    String.valueOf(Config.maxEventsFromKafkaSource));
+    props.setProperty(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString());
+    return props;
+  }
+

Review comment:
       we would merge the method with `createPropsForJsonSource` to reuse code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Litianye commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
Litianye commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-642497057


   > @Litianye no worry we can work through this together. I believe all the sources treat empty string and `Option.empty()` the same. If not then it's a bug. If we don't fix it now, the empty string will haunt us later.
   
   Get it! I add little change in `DeltaSync`. 
   
   In `KafkaSource` like `JsonKafkaSource` & `AvroKafkaSource`, `Option.empty()`and `Option.of("")` treated in reset strategy  branch https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java#L184
   
   In `DFSSource` like `AvroDFSSource` & `JsonDFSSource` & `CsvDFSSource` & `ParquetDFSSource`,  i think `Option.of("")` will never occur in method https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DFSPathSelector.java#L64, because in `DFSPathSelector` checkpoint generator from `FileStatus.getModificationTime()`.
   
   In `HoodieIncrSource`, this case fixed in line https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L106
   
   In `HiveIncrPullSource`, in line https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HiveIncrPullSource.java#L123 will return an empty string, can you assess this and how can we fix it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-641096930


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=h1) Report
   > Merging [#1719](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/9e07cebece3b4c8b964ddca2f40053734a392ce2&el=desc) will **increase** coverage by `0.03%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/1719/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1719      +/-   ##
   ============================================
   + Coverage     18.17%   18.21%   +0.03%     
   - Complexity      857      859       +2     
   ============================================
     Files           348      348              
     Lines         15369    15356      -13     
     Branches       1525     1523       -2     
   ============================================
   + Hits           2794     2797       +3     
   + Misses        12217    12201      -16     
     Partials        358      358              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...org/apache/hudi/config/HoodieCompactionConfig.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUNvbXBhY3Rpb25Db25maWcuamF2YQ==) | `55.33% <0.00%> (-0.67%)` | `3.00% <0.00%> (ø%)` | |
   | [...he/hudi/table/action/commit/UpsertPartitioner.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9VcHNlcnRQYXJ0aXRpb25lci5qYXZh) | `55.39% <0.00%> (ø)` | `15.00% <0.00%> (ø%)` | |
   | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=) | `40.71% <0.00%> (+0.63%)` | `50.00% <0.00%> (+2.00%)` | |
   | [.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh) | `7.14% <0.00%> (+1.39%)` | `4.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=footer). Last update [9e07ceb...565f9b4](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-643069988


   
   > In `KafkaSource` like `JsonKafkaSource` & `AvroKafkaSource`, `Option.empty()`and `Option.of("")` treated in reset strategy branch
   
   Can we remove the empty string handling in the source and elsewhere as well? It might confuse other people when they are reading the code there.
   
   > https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HiveIncrPullSource.java#L123
   > 
   > will return an empty string, can you assess this and how can we fix it?
   
   IMO this is fine, we need to store the empty string in some cases. As long as we can eliminate empty string during the offset calculation, the code should look cleaner. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-641096930


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=h1) Report
   > Merging [#1719](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/9e07cebece3b4c8b964ddca2f40053734a392ce2&el=desc) will **decrease** coverage by `0.03%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/1719/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1719      +/-   ##
   ============================================
   - Coverage     18.17%   18.14%   -0.04%     
   - Complexity      857      859       +2     
   ============================================
     Files           348      352       +4     
     Lines         15369    15411      +42     
     Branches       1525     1525              
   ============================================
   + Hits           2794     2797       +3     
   - Misses        12217    12256      +39     
     Partials        358      358              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...org/apache/hudi/config/HoodieCompactionConfig.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUNvbXBhY3Rpb25Db25maWcuamF2YQ==) | `55.33% <0.00%> (-0.67%)` | `3.00% <0.00%> (ø%)` | |
   | [...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...he/hudi/table/action/commit/UpsertPartitioner.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9VcHNlcnRQYXJ0aXRpb25lci5qYXZh) | `55.39% <0.00%> (ø)` | `15.00% <0.00%> (ø%)` | |
   | [.../hadoop/realtime/AbstractRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0Fic3RyYWN0UmVhbHRpbWVSZWNvcmRSZWFkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [.../hadoop/realtime/RealtimeUnmergedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lVW5tZXJnZWRSZWNvcmRSZWFkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...hadoop/realtime/RealtimeCompactedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lQ29tcGFjdGVkUmVjb3JkUmVhZGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...oop/realtime/HoodieParquetRealtimeInputFormat.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVBhcnF1ZXRSZWFsdGltZUlucHV0Rm9ybWF0LmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...op/realtime/HoodieCombineRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZUNvbWJpbmVSZWFsdGltZVJlY29yZFJlYWRlci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...on/rollback/MergeOnReadRollbackActionExecutor.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL3JvbGxiYWNrL01lcmdlT25SZWFkUm9sbGJhY2tBY3Rpb25FeGVjdXRvci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | ... and [8 more](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=footer). Last update [9e07ceb...da0e2cb](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-641096930


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=h1) Report
   > Merging [#1719](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/9e07cebece3b4c8b964ddca2f40053734a392ce2&el=desc) will **decrease** coverage by `0.03%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/1719/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1719      +/-   ##
   ============================================
   - Coverage     18.17%   18.14%   -0.04%     
   - Complexity      857      859       +2     
   ============================================
     Files           348      352       +4     
     Lines         15369    15411      +42     
     Branches       1525     1525              
   ============================================
   + Hits           2794     2797       +3     
   - Misses        12217    12256      +39     
     Partials        358      358              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...org/apache/hudi/config/HoodieCompactionConfig.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUNvbXBhY3Rpb25Db25maWcuamF2YQ==) | `55.33% <0.00%> (-0.67%)` | `3.00% <0.00%> (ø%)` | |
   | [...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...he/hudi/table/action/commit/UpsertPartitioner.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9VcHNlcnRQYXJ0aXRpb25lci5qYXZh) | `55.39% <0.00%> (ø)` | `15.00% <0.00%> (ø%)` | |
   | [.../hadoop/realtime/AbstractRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0Fic3RyYWN0UmVhbHRpbWVSZWNvcmRSZWFkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [.../hadoop/realtime/RealtimeUnmergedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lVW5tZXJnZWRSZWNvcmRSZWFkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...hadoop/realtime/RealtimeCompactedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lQ29tcGFjdGVkUmVjb3JkUmVhZGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...oop/realtime/HoodieParquetRealtimeInputFormat.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVBhcnF1ZXRSZWFsdGltZUlucHV0Rm9ybWF0LmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...op/realtime/HoodieCombineRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZUNvbWJpbmVSZWFsdGltZVJlY29yZFJlYWRlci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | [...on/rollback/MergeOnReadRollbackActionExecutor.java](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL3JvbGxiYWNrL01lcmdlT25SZWFkUm9sbGJhY2tBY3Rpb25FeGVjdXRvci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | ... and [8 more](https://codecov.io/gh/apache/hudi/pull/1719/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=footer). Last update [9e07ceb...6853b8d](https://codecov.io/gh/apache/hudi/pull/1719?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Litianye commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
Litianye commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-642395107


   > I think we can replace empty string by `Option.empty()` https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L261
   > so we only needs to handle one case later. This will make the logic more straight forward. WDYT?
   
   but afraid of empty string has a special meaning in other kind of `Source`, i only use `KafkaSource` before. :no_mouth:
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-642376368


   I think we can replace empty string by `Option.empty()` https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L261
   so we only needs to handle one case later. This will make the logic more straight forward. WDYT?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org