You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/05/06 09:01:55 UTC

[GitHub] [hudi] aliceyyan opened a new pull request, #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

aliceyyan opened a new pull request, #5516:
URL: https://github.com/apache/hudi/pull/5516

   …result is incorrect
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] aliceyyan commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
aliceyyan commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1120698051

   @danny0405  hi  danny,I updated the program,can you help me with the code review again? thank you very much  ☺


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1120705156

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 75ef37eb146b3be501729fa952777cf8439acdd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465) 
   * f88f4a68fefa4a774768ad2671bb94ab785ca93e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1119434657

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 75ef37eb146b3be501729fa952777cf8439acdd0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1120834802

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511",
       "triggerID" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8515",
       "triggerID" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8519",
       "triggerID" : "0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f88f4a68fefa4a774768ad2671bb94ab785ca93e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511) 
   * 6b98aa1d5c1cd6cee2652062d957b4bb451fadb2 UNKNOWN
   * 0045f11a810f5853e8d4607167216254fb5b4b1d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8515) 
   * 0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8519) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1119520878

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 75ef37eb146b3be501729fa952777cf8439acdd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1120992943

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511",
       "triggerID" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8515",
       "triggerID" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8519",
       "triggerID" : "0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6b98aa1d5c1cd6cee2652062d957b4bb451fadb2 UNKNOWN
   * 0045f11a810f5853e8d4607167216254fb5b4b1d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8515) 
   * 0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8519) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1121121696

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511",
       "triggerID" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8515",
       "triggerID" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8519",
       "triggerID" : "0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6b98aa1d5c1cd6cee2652062d957b4bb451fadb2 UNKNOWN
   * 0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8519) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5516:
URL: https://github.com/apache/hudi/pull/5516#discussion_r867706679


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputSplit.java:
##########
@@ -43,20 +41,25 @@ public class MergeOnReadInputSplit implements InputSplit {
   private final long maxCompactionMemoryInBytes;
   private final String mergeType;
   private final Option<InstantRange> instantRange;
+  private String fileId;
+
 
   // for streaming reader to record the consumed offset,
   // which is the start of next round reading.
   private long consumed = NUM_NO_CONSUMPTION;
 
+
+
   public MergeOnReadInputSplit(
-      int splitNum,
-      @Nullable String basePath,
-      Option<List<String>> logPaths,
-      String latestCommit,
-      String tablePath,
-      long maxCompactionMemoryInBytes,
-      String mergeType,
-      @Nullable InstantRange instantRange) {
+          int splitNum,
+          @Nullable String basePath,
+          Option<List<String>> logPaths,
+          String latestCommit,

Review Comment:
   Fix the indentation



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputSplit.java:
##########
@@ -18,13 +18,11 @@
 
 package org.apache.hudi.table.format.mor;
 
+import org.apache.flink.core.io.InputSplit;
 import org.apache.hudi.common.table.log.InstantRange;
 import org.apache.hudi.common.util.Option;
 
-import org.apache.flink.core.io.InputSplit;
-

Review Comment:
   The changes for the import is not necessary.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##########
@@ -181,7 +182,8 @@ public DataStream<RowData> produceDataStream(StreamExecutionEnvironment execEnv)
           OneInputStreamOperatorFactory<MergeOnReadInputSplit, RowData> factory = StreamReadOperator.factory((MergeOnReadInputFormat) inputFormat);
           SingleOutputStreamOperator<RowData> source = execEnv.addSource(monitoringFunction, getSourceOperatorName("split_monitor"))
               .setParallelism(1)
-              .transform("split_reader", typeInfo, factory)
+                  .keyBy((KeySelector<MergeOnReadInputSplit, String>) mos -> String.valueOf(mos.getFileId()))
+                  .transform("split_reader", typeInfo, factory)

Review Comment:
   Can the explicit `KeySelector` be removed ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] aliceyyan commented on a diff in pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
aliceyyan commented on code in PR #5516:
URL: https://github.com/apache/hudi/pull/5516#discussion_r867615650


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputSplit.java:
##########
@@ -67,6 +67,34 @@ public MergeOnReadInputSplit(
     this.instantRange = Option.ofNullable(instantRange);
   }
 
+  public MergeOnReadInputSplit(
+          int splitNum,
+          @Nullable String basePath,
+          Option<List<String>> logPaths,
+          String latestCommit,
+          String tablePath,
+          long maxCompactionMemoryInBytes,
+          String mergeType,
+          @Nullable InstantRange instantRange,String fileId ) {

Review Comment:
   I don't have to add it. Let me change it,thank you !~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
danny0405 merged PR #5516:
URL: https://github.com/apache/hudi/pull/5516


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] aliceyyan commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
aliceyyan commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1120568745

   @danny0405  can  you  help me with the code review? thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5516:
URL: https://github.com/apache/hudi/pull/5516#discussion_r867601249


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##########
@@ -181,7 +182,8 @@ public DataStream<RowData> produceDataStream(StreamExecutionEnvironment execEnv)
           OneInputStreamOperatorFactory<MergeOnReadInputSplit, RowData> factory = StreamReadOperator.factory((MergeOnReadInputFormat) inputFormat);
           SingleOutputStreamOperator<RowData> source = execEnv.addSource(monitoringFunction, getSourceOperatorName("split_monitor"))
               .setParallelism(1)
-              .transform("split_reader", typeInfo, factory)
+                  .keyBy((KeySelector<MergeOnReadInputSplit, String>) mos -> String.valueOf(mos.getFileId()))
+                  .transform("split_reader", typeInfo, factory)

Review Comment:
   Is the explicit cast necessary ?



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputSplit.java:
##########
@@ -67,6 +67,34 @@ public MergeOnReadInputSplit(
     this.instantRange = Option.ofNullable(instantRange);
   }
 
+  public MergeOnReadInputSplit(
+          int splitNum,
+          @Nullable String basePath,
+          Option<List<String>> logPaths,
+          String latestCommit,
+          String tablePath,
+          long maxCompactionMemoryInBytes,
+          String mergeType,
+          @Nullable InstantRange instantRange,String fileId ) {

Review Comment:
   Why we must add a new constructor ?



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##########
@@ -316,7 +318,7 @@ private List<MergeOnReadInputSplit> buildFileIndex() {
                   .map(logFile -> logFile.getPath().toString())
                   .collect(Collectors.toList()));
               return new MergeOnReadInputSplit(cnt.getAndAdd(1), basePath, logPaths, latestCommit,
-                  metaClient.getBasePath(), maxCompactionMemoryInBytes, mergeType, null);
+                  metaClient.getBasePath(), maxCompactionMemoryInBytes, mergeType, null,fileSlice.getFileId());
             }).collect(Collectors.toList()))

Review Comment:
   `null,fileSlice.getFileId()` -> `null, fileSlice.getFileId()`



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java:
##########
@@ -226,7 +226,7 @@ public Result inputSplits(
               String basePath = fileSlice.getBaseFile().map(BaseFile::getPath).orElse(null);
               return new MergeOnReadInputSplit(cnt.getAndAdd(1),
                   basePath, logPaths, endInstant,
-                  metaClient.getBasePath(), maxCompactionMemoryInBytes, mergeType, instantRange);
+                  metaClient.getBasePath(), maxCompactionMemoryInBytes, mergeType, instantRange,fileSlice.getFileId());
             }).collect(Collectors.toList()))

Review Comment:
   `instantRange,fileSlice.getFileId()` -> `instantRange, fileSlice.getFileId()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1120707441

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511",
       "triggerID" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 75ef37eb146b3be501729fa952777cf8439acdd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465) 
   * f88f4a68fefa4a774768ad2671bb94ab785ca93e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1119431936

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 75ef37eb146b3be501729fa952777cf8439acdd0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1120769792

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511",
       "triggerID" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 75ef37eb146b3be501729fa952777cf8439acdd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465) 
   * f88f4a68fefa4a774768ad2671bb94ab785ca93e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511) 
   * 6b98aa1d5c1cd6cee2652062d957b4bb451fadb2 UNKNOWN
   * 0045f11a810f5853e8d4607167216254fb5b4b1d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1120766690

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511",
       "triggerID" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 75ef37eb146b3be501729fa952777cf8439acdd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465) 
   * f88f4a68fefa4a774768ad2671bb94ab785ca93e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511) 
   * 6b98aa1d5c1cd6cee2652062d957b4bb451fadb2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1120816391

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511",
       "triggerID" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8515",
       "triggerID" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f88f4a68fefa4a774768ad2671bb94ab785ca93e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511) 
   * 6b98aa1d5c1cd6cee2652062d957b4bb451fadb2 UNKNOWN
   * 0045f11a810f5853e8d4607167216254fb5b4b1d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8515) 
   * 0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5516: HUDI-4044 When reading data from flink-hudi to external storage, the …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5516:
URL: https://github.com/apache/hudi/pull/5516#issuecomment-1120773175

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465",
       "triggerID" : "75ef37eb146b3be501729fa952777cf8439acdd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511",
       "triggerID" : "f88f4a68fefa4a774768ad2671bb94ab785ca93e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6b98aa1d5c1cd6cee2652062d957b4bb451fadb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8515",
       "triggerID" : "0045f11a810f5853e8d4607167216254fb5b4b1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 75ef37eb146b3be501729fa952777cf8439acdd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8465) 
   * f88f4a68fefa4a774768ad2671bb94ab785ca93e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8511) 
   * 6b98aa1d5c1cd6cee2652062d957b4bb451fadb2 UNKNOWN
   * 0045f11a810f5853e8d4607167216254fb5b4b1d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8515) 
   * 0cfbdfcb5fa8bbeb06c256e6efec59424f11b8d4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org