You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/06/21 08:43:13 UTC

[GitHub] [hudi] danny0405 opened a new pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

danny0405 opened a new pull request #3122:
URL: https://github.com/apache/hudi/pull/3122


   … NPE for file group that has only logs
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864855800






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangjun0x01 commented on a change in pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
zhangjun0x01 commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r655213190



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -165,16 +165,20 @@
         Map<String, List<HoodieBaseFile>> groupedInputSplits = partitionsToParquetSplits.get(partitionPath).stream()
             .collect(Collectors.groupingBy(file -> FSUtils.getFileId(file.getFileStatus().getPath().getName())));
         latestFileSlices.forEach(fileSlice -> {
-          List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              resultMap.put(split, logFilePaths);
-            } catch (Exception e) {
-              throw new HoodieException("Error creating hoodie real time split ", e);
-            }
-          });
+          final String fileId = fileSlice.getFileId();
+          // filter out the file group that has only logs (say the index is global).
+          if (groupedInputSplits.containsKey(fileId)) {
+            List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileId);
+            dataFileSplits.forEach(split -> {
+              try {
+                List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+                    .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
+                resultMap.put(split, logFilePaths);
+              } catch (Exception e) {
+                throw new HoodieException("Error creating hoodie real time split ", e);

Review comment:
       I mean to modify this content,for example 'Error creating hoodie real time split for group logs by BaseFile',because the exception content of `groupLogsByBaseFile` and `getRealtimeSplits` method are the same.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 closed pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 closed pull request #3122:
URL: https://github.com/apache/hudi/pull/3122


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-874493680


   Does #3203 solve this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r655208566



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -165,16 +165,20 @@
         Map<String, List<HoodieBaseFile>> groupedInputSplits = partitionsToParquetSplits.get(partitionPath).stream()
             .collect(Collectors.groupingBy(file -> FSUtils.getFileId(file.getFileStatus().getPath().getName())));
         latestFileSlices.forEach(fileSlice -> {
-          List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              resultMap.put(split, logFilePaths);
-            } catch (Exception e) {
-              throw new HoodieException("Error creating hoodie real time split ", e);
-            }
-          });
+          final String fileId = fileSlice.getFileId();
+          // filter out the file group that has only logs (say the index is global).
+          if (groupedInputSplits.containsKey(fileId)) {
+            List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileId);
+            dataFileSplits.forEach(split -> {
+              try {
+                List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+                    .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
+                resultMap.put(split, logFilePaths);
+              } catch (Exception e) {
+                throw new HoodieException("Error creating hoodie real time split ", e);

Review comment:
       The exception already throws out, what kind of exception do you suggest it to be ?

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -165,16 +165,20 @@
         Map<String, List<HoodieBaseFile>> groupedInputSplits = partitionsToParquetSplits.get(partitionPath).stream()
             .collect(Collectors.groupingBy(file -> FSUtils.getFileId(file.getFileStatus().getPath().getName())));
         latestFileSlices.forEach(fileSlice -> {
-          List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              resultMap.put(split, logFilePaths);
-            } catch (Exception e) {
-              throw new HoodieException("Error creating hoodie real time split ", e);
-            }
-          });
+          final String fileId = fileSlice.getFileId();
+          // filter out the file group that has only logs (say the index is global).
+          if (groupedInputSplits.containsKey(fileId)) {
+            List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileId);
+            dataFileSplits.forEach(split -> {
+              try {
+                List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+                    .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
+                resultMap.put(split, logFilePaths);
+              } catch (Exception e) {
+                throw new HoodieException("Error creating hoodie real time split ", e);

Review comment:
       > When we use hive query the hudi table, the same NPE exception may also occur [here](https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java#L105) , should we add the same judgment logic?
   
   Oops, i guess we should




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864855800


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3122](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9d1fc32) into [master](https://codecov.io/gh/apache/hudi/commit/cdb9b48170ef98634babd8954392efb1c1b90fcf?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cdb9b48) will **decrease** coverage by `42.71%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3122/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3122       +/-   ##
   ============================================
   - Coverage     45.85%   3.13%   -42.72%     
   + Complexity     5269      82     -5187     
   ============================================
     Files           908     274      -634     
     Lines         39332   10650    -28682     
     Branches       4239    1088     -3151     
   ============================================
   - Hits          18036     334    -17702     
   + Misses        19451   10290     -9161     
   + Partials       1845      26     -1819     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-30.45%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.72% <ø> (-45.01%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.52% <ø> (-47.12%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [744 more](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [cdb9b48...9d1fc32](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 closed pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 closed pull request #3122:
URL: https://github.com/apache/hudi/pull/3122


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangjun0x01 commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
zhangjun0x01 commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864873076


   When we use hive query the hudi table, the same NPE exception may also occur [here](https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java#L105) ,  should we add the same judgment logic?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangjun0x01 commented on a change in pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
zhangjun0x01 commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r655204771



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -165,16 +165,20 @@
         Map<String, List<HoodieBaseFile>> groupedInputSplits = partitionsToParquetSplits.get(partitionPath).stream()
             .collect(Collectors.groupingBy(file -> FSUtils.getFileId(file.getFileStatus().getPath().getName())));
         latestFileSlices.forEach(fileSlice -> {
-          List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              resultMap.put(split, logFilePaths);
-            } catch (Exception e) {
-              throw new HoodieException("Error creating hoodie real time split ", e);
-            }
-          });
+          final String fileId = fileSlice.getFileId();
+          // filter out the file group that has only logs (say the index is global).
+          if (groupedInputSplits.containsKey(fileId)) {
+            List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileId);
+            dataFileSplits.forEach(split -> {
+              try {
+                List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+                    .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
+                resultMap.put(split, logFilePaths);
+              } catch (Exception e) {
+                throw new HoodieException("Error creating hoodie real time split ", e);

Review comment:
       nit : Should we modify the exception message so that when throw an exception, we can distinguish it from the exception in the `getRealtimeSplits` method?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-869514439


   HI @danny0405   yes, we will fix this problem by HUDI-2086, it will be ok to close this one


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864855800


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3122](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (530787e) into [master](https://codecov.io/gh/apache/hudi/commit/cdb9b48170ef98634babd8954392efb1c1b90fcf?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cdb9b48) will **decrease** coverage by `42.71%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3122/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3122       +/-   ##
   ============================================
   - Coverage     45.85%   3.13%   -42.72%     
   + Complexity     5269      82     -5187     
   ============================================
     Files           908     274      -634     
     Lines         39332   10650    -28682     
     Branches       4239    1088     -3151     
   ============================================
   - Hits          18036     334    -17702     
   + Misses        19451   10290     -9161     
   + Partials       1845      26     -1819     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-30.45%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.72% <ø> (-45.01%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.52% <ø> (-47.12%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [744 more](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [cdb9b48...530787e](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-869486590


   HI @vinothchandar  @danny0405    i think we should not filter those logs directly, those logs contains the data which we needed。   we encounter this problem and solved it in our products,this problem is sub_problem3 of HUDI-2086 and the correlation pr will come next few days。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d1fc32406d8633cef61672457c5448cf5aa4dfa UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangjun0x01 commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
zhangjun0x01 commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864873076


   When we use hive query the hudi table, the same NPE exception may also occur [here](https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java#L105) ,  should we add the same judgment logic?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864855800






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-874503060


   > Does #3203 solve this?
   
   I think it should, would just close this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 closed pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 closed pull request #3122:
URL: https://github.com/apache/hudi/pull/3122


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r664267191



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
##########
@@ -209,6 +210,12 @@ public Configuration getConf() {
             colNamesWithTypesForExternal.size(),
             true);
       }
+    } else if (split instanceof RealtimeSplit) {

Review comment:
       can we do this at the sub class? Not sure if we want to make this class aware of RealtimeSplit 

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
##########
@@ -253,4 +260,34 @@ private BootstrapBaseFileSplit makeExternalFileSplit(PathWithBootstrapFileStatus
       throw new HoodieIOException(e.getMessage(), e);
     }
   }
+
+  /**
+   * Dummy record reader that outputs nothing.
+   */
+  public static class DummyRecordReader implements RecordReader<NullWritable, ArrayWritable> {

Review comment:
       rename: NoopRecordReader

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -102,28 +105,37 @@
             HoodieTimeline.ROLLBACK_ACTION, HoodieTimeline.DELTA_COMMIT_ACTION, HoodieTimeline.REPLACE_COMMIT_ACTION))
             .filterCompletedInstants().lastInstant().get().getTimestamp();
         latestFileSlices.forEach(fileSlice -> {
+          List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+              .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
           List<FileSplit> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              if (split instanceof BootstrapBaseFileSplit) {
-                BootstrapBaseFileSplit eSplit = (BootstrapBaseFileSplit) split;
-                String[] hosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
-                    .filter(x -> !x.isInMemory()).toArray(String[]::new) : new String[0];
-                String[] inMemoryHosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
-                    .filter(SplitLocationInfo::isInMemory).toArray(String[]::new) : new String[0];
-                FileSplit baseSplit = new FileSplit(eSplit.getPath(), eSplit.getStart(), eSplit.getLength(),
-                    hosts, inMemoryHosts);
-                rtSplits.add(new RealtimeBootstrapBaseFileSplit(baseSplit, metaClient.getBasePath(),
-                    logFilePaths, maxCommitTime, eSplit.getBootstrapFileSplit()));
-              } else {
-                rtSplits.add(new HoodieRealtimeFileSplit(split, metaClient.getBasePath(), logFilePaths, maxCommitTime));
+          if (dataFileSplits != null) {
+            dataFileSplits.forEach(split -> {
+              try {
+                if (split instanceof BootstrapBaseFileSplit) {
+                  BootstrapBaseFileSplit eSplit = (BootstrapBaseFileSplit) split;
+                  String[] hosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
+                      .filter(x -> !x.isInMemory()).toArray(String[]::new) : new String[0];
+                  String[] inMemoryHosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
+                      .filter(SplitLocationInfo::isInMemory).toArray(String[]::new) : new String[0];
+                  FileSplit baseSplit = new FileSplit(eSplit.getPath(), eSplit.getStart(), eSplit.getLength(),
+                      hosts, inMemoryHosts);
+                  rtSplits.add(new RealtimeBootstrapBaseFileSplit(baseSplit, metaClient.getBasePath(),
+                      logFilePaths, maxCommitTime, eSplit.getBootstrapFileSplit()));
+                } else {
+                  rtSplits.add(new HoodieRealtimeFileSplit(split, metaClient.getBasePath(), logFilePaths, maxCommitTime));
+                }
+              } catch (IOException e) {
+                throw new HoodieIOException("Error creating hoodie real time split ", e);
               }
+            });
+          } else {
+            // the file group has only logs (say the index is global).
+            try {
+              rtSplits.add(new HoodieRealtimeFileSplit(DummyInputSplit.INSTANCE, metaClient.getBasePath(), logFilePaths, maxCommitTime));

Review comment:
       This code seems to be intending to solely avoid reading this split, if it only has logs? Can't we just skip this fileId /fileslice, instead of adding a dummy split and record reader. They do add quite bit of complexity here. Wondering if we can fix it localized manner if all we want to do is avoid NPE




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r655208566



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -165,16 +165,20 @@
         Map<String, List<HoodieBaseFile>> groupedInputSplits = partitionsToParquetSplits.get(partitionPath).stream()
             .collect(Collectors.groupingBy(file -> FSUtils.getFileId(file.getFileStatus().getPath().getName())));
         latestFileSlices.forEach(fileSlice -> {
-          List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              resultMap.put(split, logFilePaths);
-            } catch (Exception e) {
-              throw new HoodieException("Error creating hoodie real time split ", e);
-            }
-          });
+          final String fileId = fileSlice.getFileId();
+          // filter out the file group that has only logs (say the index is global).
+          if (groupedInputSplits.containsKey(fileId)) {
+            List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileId);
+            dataFileSplits.forEach(split -> {
+              try {
+                List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+                    .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
+                resultMap.put(split, logFilePaths);
+              } catch (Exception e) {
+                throw new HoodieException("Error creating hoodie real time split ", e);

Review comment:
       The exception already throws out, what kind of exception do you suggest it to be ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d1fc32406d8633cef61672457c5448cf5aa4dfa UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302",
       "triggerID" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=307",
       "triggerID" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "84bfc2d2c3bea3d088e3c1f8b0a390c586d80e48",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "84bfc2d2c3bea3d088e3c1f8b0a390c586d80e48",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 530787e9a14f8ac180399f4641a58e5eff6ada11 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=307) 
   * 84bfc2d2c3bea3d088e3c1f8b0a390c586d80e48 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 closed pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 closed pull request #3122:
URL: https://github.com/apache/hudi/pull/3122


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302",
       "triggerID" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=307",
       "triggerID" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "84bfc2d2c3bea3d088e3c1f8b0a390c586d80e48",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=555",
       "triggerID" : "84bfc2d2c3bea3d088e3c1f8b0a390c586d80e48",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 84bfc2d2c3bea3d088e3c1f8b0a390c586d80e48 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=555) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r664272560



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -102,28 +105,37 @@
             HoodieTimeline.ROLLBACK_ACTION, HoodieTimeline.DELTA_COMMIT_ACTION, HoodieTimeline.REPLACE_COMMIT_ACTION))
             .filterCompletedInstants().lastInstant().get().getTimestamp();
         latestFileSlices.forEach(fileSlice -> {
+          List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+              .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
           List<FileSplit> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              if (split instanceof BootstrapBaseFileSplit) {
-                BootstrapBaseFileSplit eSplit = (BootstrapBaseFileSplit) split;
-                String[] hosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
-                    .filter(x -> !x.isInMemory()).toArray(String[]::new) : new String[0];
-                String[] inMemoryHosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
-                    .filter(SplitLocationInfo::isInMemory).toArray(String[]::new) : new String[0];
-                FileSplit baseSplit = new FileSplit(eSplit.getPath(), eSplit.getStart(), eSplit.getLength(),
-                    hosts, inMemoryHosts);
-                rtSplits.add(new RealtimeBootstrapBaseFileSplit(baseSplit, metaClient.getBasePath(),
-                    logFilePaths, maxCommitTime, eSplit.getBootstrapFileSplit()));
-              } else {
-                rtSplits.add(new HoodieRealtimeFileSplit(split, metaClient.getBasePath(), logFilePaths, maxCommitTime));
+          if (dataFileSplits != null) {
+            dataFileSplits.forEach(split -> {
+              try {
+                if (split instanceof BootstrapBaseFileSplit) {
+                  BootstrapBaseFileSplit eSplit = (BootstrapBaseFileSplit) split;
+                  String[] hosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
+                      .filter(x -> !x.isInMemory()).toArray(String[]::new) : new String[0];
+                  String[] inMemoryHosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
+                      .filter(SplitLocationInfo::isInMemory).toArray(String[]::new) : new String[0];
+                  FileSplit baseSplit = new FileSplit(eSplit.getPath(), eSplit.getStart(), eSplit.getLength(),
+                      hosts, inMemoryHosts);
+                  rtSplits.add(new RealtimeBootstrapBaseFileSplit(baseSplit, metaClient.getBasePath(),
+                      logFilePaths, maxCommitTime, eSplit.getBootstrapFileSplit()));
+                } else {
+                  rtSplits.add(new HoodieRealtimeFileSplit(split, metaClient.getBasePath(), logFilePaths, maxCommitTime));
+                }
+              } catch (IOException e) {
+                throw new HoodieIOException("Error creating hoodie real time split ", e);
               }
+            });
+          } else {
+            // the file group has only logs (say the index is global).
+            try {
+              rtSplits.add(new HoodieRealtimeFileSplit(DummyInputSplit.INSTANCE, metaClient.getBasePath(), logFilePaths, maxCommitTime));

Review comment:
       Personally i want to fix the NPE at first, read the file groups with parquets is better than a thrown exception. You you think reading the logs only file group adds quite bit of complexity, we can avoid that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864855800


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3122](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (530787e) into [master](https://codecov.io/gh/apache/hudi/commit/cdb9b48170ef98634babd8954392efb1c1b90fcf?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cdb9b48) will **increase** coverage by `0.15%`.
   > The diff coverage is `22.58%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3122/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #3122      +/-   ##
   ============================================
   + Coverage     45.85%   46.01%   +0.15%     
   - Complexity     5269     5306      +37     
   ============================================
     Files           908      911       +3     
     Lines         39332    39480     +148     
     Branches       4239     4256      +17     
   ============================================
   + Hits          18036    18167     +131     
   - Misses        19451    19458       +7     
   - Partials       1845     1855      +10     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <ø> (ø)` | |
   | hudiclient | `30.44% <ø> (ø)` | |
   | hudicommon | `47.56% <ø> (-0.02%)` | :arrow_down: |
   | hudiflink | `61.33% <ø> (ø)` | |
   | hudihadoopmr | `51.23% <22.58%> (-0.06%)` | :arrow_down: |
   | hudisparkdatasource | `66.52% <ø> (ø)` | |
   | hudisync | `51.73% <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | |
   | hudiutilities | `58.37% <ø> (+1.73%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...i/hadoop/utils/HoodieRealtimeInputFormatUtils.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZVJlYWx0aW1lSW5wdXRGb3JtYXRVdGlscy5qYXZh) | `34.45% <22.58%> (-0.33%)` | :arrow_down: |
   | [...ache/hudi/table/action/rollback/RollbackUtils.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RhYmxlL2FjdGlvbi9yb2xsYmFjay9Sb2xsYmFja1V0aWxzLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [.../org/apache/hudi/utilities/sources/JdbcSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSmRiY1NvdXJjZS5qYXZh) | `92.00% <0.00%> (ø)` | |
   | [...apache/hudi/exception/HoodieMetadataException.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZU1ldGFkYXRhRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...ava/org/apache/hudi/utilities/SqlQueryBuilder.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1NxbFF1ZXJ5QnVpbGRlci5qYXZh) | `92.50% <0.00%> (ø)` | |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `72.83% <0.00%> (+0.39%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [cdb9b48...530787e](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302",
       "triggerID" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=307",
       "triggerID" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d1fc32406d8633cef61672457c5448cf5aa4dfa Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302) 
   * 530787e9a14f8ac180399f4641a58e5eff6ada11 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=307) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-871114694


   @vinothchandar I have modified to support pure logs file group reading, can you take a look again, thanks ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302",
       "triggerID" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d1fc32406d8633cef61672457c5448cf5aa4dfa Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864876572


   Hi @vinothchandar , can you also take a look, i guess this is also a blocker because the global index causes this problem (file group that has no parquet triggers the NPE).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302",
       "triggerID" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d1fc32406d8633cef61672457c5448cf5aa4dfa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangjun0x01 commented on a change in pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
zhangjun0x01 commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r655204771



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -165,16 +165,20 @@
         Map<String, List<HoodieBaseFile>> groupedInputSplits = partitionsToParquetSplits.get(partitionPath).stream()
             .collect(Collectors.groupingBy(file -> FSUtils.getFileId(file.getFileStatus().getPath().getName())));
         latestFileSlices.forEach(fileSlice -> {
-          List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              resultMap.put(split, logFilePaths);
-            } catch (Exception e) {
-              throw new HoodieException("Error creating hoodie real time split ", e);
-            }
-          });
+          final String fileId = fileSlice.getFileId();
+          // filter out the file group that has only logs (say the index is global).
+          if (groupedInputSplits.containsKey(fileId)) {
+            List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileId);
+            dataFileSplits.forEach(split -> {
+              try {
+                List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+                    .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
+                resultMap.put(split, logFilePaths);
+              } catch (Exception e) {
+                throw new HoodieException("Error creating hoodie real time split ", e);

Review comment:
       nit : Should we modify the exception message so that when throw an exception, we can distinguish it from the exception in the `getRealtimeSplits` method?

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -165,16 +165,20 @@
         Map<String, List<HoodieBaseFile>> groupedInputSplits = partitionsToParquetSplits.get(partitionPath).stream()
             .collect(Collectors.groupingBy(file -> FSUtils.getFileId(file.getFileStatus().getPath().getName())));
         latestFileSlices.forEach(fileSlice -> {
-          List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              resultMap.put(split, logFilePaths);
-            } catch (Exception e) {
-              throw new HoodieException("Error creating hoodie real time split ", e);
-            }
-          });
+          final String fileId = fileSlice.getFileId();
+          // filter out the file group that has only logs (say the index is global).
+          if (groupedInputSplits.containsKey(fileId)) {
+            List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileId);
+            dataFileSplits.forEach(split -> {
+              try {
+                List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+                    .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
+                resultMap.put(split, logFilePaths);
+              } catch (Exception e) {
+                throw new HoodieException("Error creating hoodie real time split ", e);

Review comment:
       I mean to modify this content,for example 'Error creating hoodie real time split for group logs by BaseFile',because the exception content of `groupLogsByBaseFile` and `getRealtimeSplits` method are the same.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864855800


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3122](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (84bfc2d) into [master](https://codecov.io/gh/apache/hudi/commit/202887b8ca27eb6de808ba7a2e737b13ae9eb8c0?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (202887b) will **decrease** coverage by `29.88%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3122/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #3122       +/-   ##
   =============================================
   - Coverage     46.19%   16.30%   -29.89%     
   + Complexity     5385      474     -4911     
   =============================================
     Files           921      280      -641     
     Lines         40040    10875    -29165     
     Branches       4294     1106     -3188     
   =============================================
   - Hits          18495     1773    -16722     
   + Misses        19661     8944    -10717     
   + Partials       1884      158     -1726     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-30.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `58.44% <ø> (+0.03%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../apache/hudi/keygen/constant/KeyGeneratorType.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2tleWdlbi9jb25zdGFudC9LZXlHZW5lcmF0b3JUeXBlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...pache/hudi/client/utils/ConcatenatingIterator.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC91dGlscy9Db25jYXRlbmF0aW5nSXRlcmF0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../hudi/execution/bulkinsert/BulkInsertSortMode.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2V4ZWN1dGlvbi9idWxraW5zZXJ0L0J1bGtJbnNlcnRTb3J0TW9kZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...able/action/compact/CompactionTriggerStrategy.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RhYmxlL2FjdGlvbi9jb21wYWN0L0NvbXBhY3Rpb25UcmlnZ2VyU3RyYXRlZ3kuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [707 more](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [202887b...84bfc2d](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302",
       "triggerID" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=307",
       "triggerID" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 530787e9a14f8ac180399f4641a58e5eff6ada11 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=307) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864855800


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3122](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9d1fc32) into [master](https://codecov.io/gh/apache/hudi/commit/cdb9b48170ef98634babd8954392efb1c1b90fcf?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cdb9b48) will **decrease** coverage by `42.71%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3122/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3122       +/-   ##
   ============================================
   - Coverage     45.85%   3.13%   -42.72%     
   + Complexity     5269      82     -5187     
   ============================================
     Files           908     274      -634     
     Lines         39332   10650    -28682     
     Branches       4239    1088     -3151     
   ============================================
   - Hits          18036     334    -17702     
   + Misses        19451   10290     -9161     
   + Partials       1845      26     -1819     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-30.45%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `6.72% <ø> (-45.01%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.52% <ø> (-47.12%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [744 more](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [cdb9b48...9d1fc32](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r655226296



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -165,16 +165,20 @@
         Map<String, List<HoodieBaseFile>> groupedInputSplits = partitionsToParquetSplits.get(partitionPath).stream()
             .collect(Collectors.groupingBy(file -> FSUtils.getFileId(file.getFileStatus().getPath().getName())));
         latestFileSlices.forEach(fileSlice -> {
-          List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              resultMap.put(split, logFilePaths);
-            } catch (Exception e) {
-              throw new HoodieException("Error creating hoodie real time split ", e);
-            }
-          });
+          final String fileId = fileSlice.getFileId();
+          // filter out the file group that has only logs (say the index is global).
+          if (groupedInputSplits.containsKey(fileId)) {
+            List<HoodieBaseFile> dataFileSplits = groupedInputSplits.get(fileId);
+            dataFileSplits.forEach(split -> {
+              try {
+                List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+                    .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
+                resultMap.put(split, logFilePaths);
+              } catch (Exception e) {
+                throw new HoodieException("Error creating hoodie real time split ", e);

Review comment:
       > When we use hive query the hudi table, the same NPE exception may also occur [here](https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java#L105) , should we add the same judgment logic?
   
   Oops, i guess we should




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r664272560



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -102,28 +105,37 @@
             HoodieTimeline.ROLLBACK_ACTION, HoodieTimeline.DELTA_COMMIT_ACTION, HoodieTimeline.REPLACE_COMMIT_ACTION))
             .filterCompletedInstants().lastInstant().get().getTimestamp();
         latestFileSlices.forEach(fileSlice -> {
+          List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+              .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
           List<FileSplit> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              if (split instanceof BootstrapBaseFileSplit) {
-                BootstrapBaseFileSplit eSplit = (BootstrapBaseFileSplit) split;
-                String[] hosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
-                    .filter(x -> !x.isInMemory()).toArray(String[]::new) : new String[0];
-                String[] inMemoryHosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
-                    .filter(SplitLocationInfo::isInMemory).toArray(String[]::new) : new String[0];
-                FileSplit baseSplit = new FileSplit(eSplit.getPath(), eSplit.getStart(), eSplit.getLength(),
-                    hosts, inMemoryHosts);
-                rtSplits.add(new RealtimeBootstrapBaseFileSplit(baseSplit, metaClient.getBasePath(),
-                    logFilePaths, maxCommitTime, eSplit.getBootstrapFileSplit()));
-              } else {
-                rtSplits.add(new HoodieRealtimeFileSplit(split, metaClient.getBasePath(), logFilePaths, maxCommitTime));
+          if (dataFileSplits != null) {
+            dataFileSplits.forEach(split -> {
+              try {
+                if (split instanceof BootstrapBaseFileSplit) {
+                  BootstrapBaseFileSplit eSplit = (BootstrapBaseFileSplit) split;
+                  String[] hosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
+                      .filter(x -> !x.isInMemory()).toArray(String[]::new) : new String[0];
+                  String[] inMemoryHosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
+                      .filter(SplitLocationInfo::isInMemory).toArray(String[]::new) : new String[0];
+                  FileSplit baseSplit = new FileSplit(eSplit.getPath(), eSplit.getStart(), eSplit.getLength(),
+                      hosts, inMemoryHosts);
+                  rtSplits.add(new RealtimeBootstrapBaseFileSplit(baseSplit, metaClient.getBasePath(),
+                      logFilePaths, maxCommitTime, eSplit.getBootstrapFileSplit()));
+                } else {
+                  rtSplits.add(new HoodieRealtimeFileSplit(split, metaClient.getBasePath(), logFilePaths, maxCommitTime));
+                }
+              } catch (IOException e) {
+                throw new HoodieIOException("Error creating hoodie real time split ", e);
               }
+            });
+          } else {
+            // the file group has only logs (say the index is global).
+            try {
+              rtSplits.add(new HoodieRealtimeFileSplit(DummyInputSplit.INSTANCE, metaClient.getBasePath(), logFilePaths, maxCommitTime));

Review comment:
       Personally i want to fix the NPE at first, read the file groups with parquets is better than a thrown exception. You you think reading the logs only file group adds quite bit of complexity, we can avoid that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302",
       "triggerID" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=307",
       "triggerID" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "84bfc2d2c3bea3d088e3c1f8b0a390c586d80e48",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=555",
       "triggerID" : "84bfc2d2c3bea3d088e3c1f8b0a390c586d80e48",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 530787e9a14f8ac180399f4641a58e5eff6ada11 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=307) 
   * 84bfc2d2c3bea3d088e3c1f8b0a390c586d80e48 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=555) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-874503060


   > Does #3203 solve this?
   
   I think it should, would just close this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864855800


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3122](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (84bfc2d) into [master](https://codecov.io/gh/apache/hudi/commit/202887b8ca27eb6de808ba7a2e737b13ae9eb8c0?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (202887b) will **decrease** coverage by `43.12%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3122/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3122       +/-   ##
   ============================================
   - Coverage     46.19%   3.07%   -43.13%     
   + Complexity     5385      82     -5303     
   ============================================
     Files           921     280      -641     
     Lines         40040   10875    -29165     
     Branches       4294    1106     -3188     
   ============================================
   - Hits          18495     334    -18161     
   + Misses        19661   10515     -9146     
   + Partials       1884      26     -1858     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-30.46%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.46% <ø> (-48.95%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [753 more](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [202887b...84bfc2d](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864876572


   Hi @vinothchandar , can you also take a look, i guess this is also a blocker because the global index causes this problem (file group that has no parquet triggers the NPE).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#discussion_r664267191



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
##########
@@ -209,6 +210,12 @@ public Configuration getConf() {
             colNamesWithTypesForExternal.size(),
             true);
       }
+    } else if (split instanceof RealtimeSplit) {

Review comment:
       can we do this at the sub class? Not sure if we want to make this class aware of RealtimeSplit 

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
##########
@@ -253,4 +260,34 @@ private BootstrapBaseFileSplit makeExternalFileSplit(PathWithBootstrapFileStatus
       throw new HoodieIOException(e.getMessage(), e);
     }
   }
+
+  /**
+   * Dummy record reader that outputs nothing.
+   */
+  public static class DummyRecordReader implements RecordReader<NullWritable, ArrayWritable> {

Review comment:
       rename: NoopRecordReader

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -102,28 +105,37 @@
             HoodieTimeline.ROLLBACK_ACTION, HoodieTimeline.DELTA_COMMIT_ACTION, HoodieTimeline.REPLACE_COMMIT_ACTION))
             .filterCompletedInstants().lastInstant().get().getTimestamp();
         latestFileSlices.forEach(fileSlice -> {
+          List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+              .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
           List<FileSplit> dataFileSplits = groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
-                  .map(logFile -> logFile.getPath().toString()).collect(Collectors.toList());
-              if (split instanceof BootstrapBaseFileSplit) {
-                BootstrapBaseFileSplit eSplit = (BootstrapBaseFileSplit) split;
-                String[] hosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
-                    .filter(x -> !x.isInMemory()).toArray(String[]::new) : new String[0];
-                String[] inMemoryHosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
-                    .filter(SplitLocationInfo::isInMemory).toArray(String[]::new) : new String[0];
-                FileSplit baseSplit = new FileSplit(eSplit.getPath(), eSplit.getStart(), eSplit.getLength(),
-                    hosts, inMemoryHosts);
-                rtSplits.add(new RealtimeBootstrapBaseFileSplit(baseSplit, metaClient.getBasePath(),
-                    logFilePaths, maxCommitTime, eSplit.getBootstrapFileSplit()));
-              } else {
-                rtSplits.add(new HoodieRealtimeFileSplit(split, metaClient.getBasePath(), logFilePaths, maxCommitTime));
+          if (dataFileSplits != null) {
+            dataFileSplits.forEach(split -> {
+              try {
+                if (split instanceof BootstrapBaseFileSplit) {
+                  BootstrapBaseFileSplit eSplit = (BootstrapBaseFileSplit) split;
+                  String[] hosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
+                      .filter(x -> !x.isInMemory()).toArray(String[]::new) : new String[0];
+                  String[] inMemoryHosts = split.getLocationInfo() != null ? Arrays.stream(split.getLocationInfo())
+                      .filter(SplitLocationInfo::isInMemory).toArray(String[]::new) : new String[0];
+                  FileSplit baseSplit = new FileSplit(eSplit.getPath(), eSplit.getStart(), eSplit.getLength(),
+                      hosts, inMemoryHosts);
+                  rtSplits.add(new RealtimeBootstrapBaseFileSplit(baseSplit, metaClient.getBasePath(),
+                      logFilePaths, maxCommitTime, eSplit.getBootstrapFileSplit()));
+                } else {
+                  rtSplits.add(new HoodieRealtimeFileSplit(split, metaClient.getBasePath(), logFilePaths, maxCommitTime));
+                }
+              } catch (IOException e) {
+                throw new HoodieIOException("Error creating hoodie real time split ", e);
               }
+            });
+          } else {
+            // the file group has only logs (say the index is global).
+            try {
+              rtSplits.add(new HoodieRealtimeFileSplit(DummyInputSplit.INSTANCE, metaClient.getBasePath(), logFilePaths, maxCommitTime));

Review comment:
       This code seems to be intending to solely avoid reading this split, if it only has logs? Can't we just skip this fileId /fileslice, instead of adding a dummy split and record reader. They do add quite bit of complexity here. Wondering if we can fix it localized manner if all we want to do is avoid NPE




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864851929


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302",
       "triggerID" : "9d1fc32406d8633cef61672457c5448cf5aa4dfa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "530787e9a14f8ac180399f4641a58e5eff6ada11",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d1fc32406d8633cef61672457c5448cf5aa4dfa Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=302) 
   * 530787e9a14f8ac180399f4641a58e5eff6ada11 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-869492247


   > HI @vinothchandar @danny0405 i think we should not filter those logs directly, those logs contains the data which we needed。 we encounter this problem and solved it in our products,this problem is sub_problem3 of HUDI-2086 and the correlation pr will come next few days。
   
   That's cool, so you mean you would fix that in HUDI-2086 though ? Can we close this one instead ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-874493680


   Does #3203 solve this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864855800


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3122](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (84bfc2d) into [master](https://codecov.io/gh/apache/hudi/commit/202887b8ca27eb6de808ba7a2e737b13ae9eb8c0?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (202887b) will **decrease** coverage by `17.71%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3122/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #3122       +/-   ##
   =============================================
   - Coverage     46.19%   28.47%   -17.72%     
   + Complexity     5385     1262     -4123     
   =============================================
     Files           921      376      -545     
     Lines         40040    14326    -25714     
     Branches       4294     1458     -2836     
   =============================================
   - Hits          18495     4080    -14415     
   + Misses        19661     9951     -9710     
   + Partials       1884      295     -1589     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `22.29% <ø> (-8.16%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `58.44% <ø> (+0.03%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/hudi/metrics/MetricsReporterType.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyVHlwZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...rg/apache/hudi/client/bootstrap/BootstrapMode.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9ib290c3RyYXAvQm9vdHN0cmFwTW9kZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...he/hudi/hive/HiveStylePartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN0eWxlUGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../apache/hudi/keygen/constant/KeyGeneratorType.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2tleWdlbi9jb25zdGFudC9LZXlHZW5lcmF0b3JUeXBlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...pache/hudi/client/utils/ConcatenatingIterator.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC91dGlscy9Db25jYXRlbmF0aW5nSXRlcmF0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../hudi/execution/bulkinsert/BulkInsertSortMode.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2V4ZWN1dGlvbi9idWxraW5zZXJ0L0J1bGtJbnNlcnRTb3J0TW9kZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...able/action/compact/CompactionTriggerStrategy.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RhYmxlL2FjdGlvbi9jb21wYWN0L0NvbXBhY3Rpb25UcmlnZ2VyU3RyYXRlZ3kuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [611 more](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [202887b...84bfc2d](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #3122: [HUDI-2048] HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws…

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #3122:
URL: https://github.com/apache/hudi/pull/3122#issuecomment-864855800


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3122](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (84bfc2d) into [master](https://codecov.io/gh/apache/hudi/commit/202887b8ca27eb6de808ba7a2e737b13ae9eb8c0?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (202887b) will **increase** coverage by `0.00%`.
   > The diff coverage is `36.95%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3122/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #3122   +/-   ##
   =========================================
     Coverage     46.19%   46.19%           
   - Complexity     5385     5386    +1     
   =========================================
     Files           921      921           
     Lines         40040    40062   +22     
     Branches       4294     4297    +3     
   =========================================
   + Hits          18495    18506   +11     
   - Misses        19661    19669    +8     
   - Partials       1884     1887    +3     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <ø> (ø)` | |
   | hudiclient | `30.45% <ø> (ø)` | |
   | hudicommon | `47.56% <ø> (-0.02%)` | :arrow_down: |
   | hudiflink | `59.91% <ø> (ø)` | |
   | hudihadoopmr | `51.33% <36.95%> (+0.03%)` | :arrow_up: |
   | hudisparkdatasource | `67.06% <ø> (ø)` | |
   | hudisync | `54.05% <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | |
   | hudiutilities | `58.44% <ø> (+0.03%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...i/hadoop/utils/HoodieRealtimeInputFormatUtils.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZVJlYWx0aW1lSW5wdXRGb3JtYXRVdGlscy5qYXZh) | `34.40% <23.52%> (-0.39%)` | :arrow_down: |
   | [...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh) | `44.66% <75.00%> (+4.00%)` | :arrow_up: |
   | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `78.12% <0.00%> (-1.57%)` | :arrow_down: |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/3122/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `71.28% <0.00%> (+0.33%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [202887b...84bfc2d](https://codecov.io/gh/apache/hudi/pull/3122?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org