You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/22 08:16:30 UTC

[GitHub] [hudi] danny0405 opened a new pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

danny0405 opened a new pull request #3703:
URL: https://github.com/apache/hudi/pull/3703


   … is ignored by MOR snapshot reader
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-971408576


   > Hi,everyone,I have used this patch, i used flink sql execute batch query and the results were as expected , but spark-sql still lost data when querying rt table. I want it can as release blocker.
   
   Hi @mincwang , is that possible to reproduce the data lost in a unit test? Are you using hive rt view or Spark MOR relation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-976788002


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 477b14f66e26c4606b2c54f4af7193128e8611ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-977954895


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     }, {
       "hash" : "36e0b079c84a3bda2c8cecc07214b26259bf81be",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "36e0b079c84a3bda2c8cecc07214b26259bf81be",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 477b14f66e26c4606b2c54f4af7193128e8611ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645) 
   * 36e0b079c84a3bda2c8cecc07214b26259bf81be UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-976788002


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 477b14f66e26c4606b2c54f4af7193128e8611ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-973717580


   When I use the following command to perform the compression manually,query again
   ``` shell
   bin/flink run -c org.apache.hudi.sink.compact.HoodieFlinkCompactor lib/hudi-flink-bundle_2.12-0.10.0-SNAPSHOT.jar --path hdfs:///hudi/debug/user
   ```
   - **hive**
   ![image](https://user-images.githubusercontent.com/33626973/142562949-80f62e7a-0181-462a-a50c-a1c10f241c9c.png)
   - **spark**
   ![image](https://user-images.githubusercontent.com/33626973/142562990-5070f369-4625-4a68-99ae-778522fcc281.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-978867541


   This patch is great, but there is one small problem,following:
   - when I delete record with id 5
    hive query reslut:
   ![image](https://user-images.githubusercontent.com/33626973/143387325-dadd2ba0-d0fa-4e80-835a-b913219ca5d3.png)
    spark query reslut:
    ![image](https://user-images.githubusercontent.com/33626973/143387727-37b867c0-4cd0-427b-993a-c097c16770b1.png)
     flink query reslut is right
   
   because @danny0405 skipped deleted records in the snapshot query  https://github.com/apache/hudi/pull/4041


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-978873695


   Only when I complete the compression is the result expected
   
   and in between, I exected the following DML:
   ```sql
   INSERT INTO user VALUES(4,'4')
   UPDATE user SET name='222' where id = 2
   ```
   
   After  complete the compression 
   hive:
   ![image](https://user-images.githubusercontent.com/33626973/143389680-6f093c74-d683-4ade-8536-79976945232c.png)
   spark:
   ![image](https://user-images.githubusercontent.com/33626973/143389816-1993b8a5-9231-4e65-be79-bf404ce129de.png)
   
   mysql:
   ![image](https://user-images.githubusercontent.com/33626973/143390202-2b1dee57-b615-4b3b-8b95-0c99236a117f.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-924695744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-973697230


   The query results of both Hive and Spark are incorrect


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-977957239


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     }, {
       "hash" : "36e0b079c84a3bda2c8cecc07214b26259bf81be",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3684",
       "triggerID" : "36e0b079c84a3bda2c8cecc07214b26259bf81be",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 477b14f66e26c4606b2c54f4af7193128e8611ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645) 
   * 36e0b079c84a3bda2c8cecc07214b26259bf81be Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3684) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-961588461


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-961588461


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-961588461


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-968081314


   Makes sense. Lets dig into this 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-974792940


   > 
   
   
   
   > @mincwang I think I find the cause of this behavior The codepath of hive rt query goes to
   > 
   > https://github.com/apache/hudi/blob/0fb8556b0d9274aef650a46bb82a8cf495d4450b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java#L158-L169
   > 
   > 
   > you could set the config HOODIE_CONSUME_PENDING_COMMITS to true and try again.
   > The Spark MOR snapshot read codepath goes to
   > 
   > https://github.com/apache/hudi/blob/a0dae41409a4f2d509aae1b16a4b509ec774c454/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java#L238-L240
   > 
   > 
   > We should include the compaction request instant here as well.
   > Do you mind having a try with this fix?
   > 
   > The file listing code path of Spark/Hive/Flink is different now, which leads to this issue. We need to unify the file listing as a high-priority task.
   
   Why the Spark MOR snapshot read codepath goes to `hudi-hadoop-mr`?It shouldn't be `hudi-spark`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-974565836


   Looking forward @garyli1019 fixing it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-978189487


   Looking into this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 merged pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
garyli1019 merged pull request #3703:
URL: https://github.com/apache/hudi/pull/3703


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-924695744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-974789158


   @mincwang I think I find the cause of this behavior
   The codepath of hive rt query goes to https://github.com/apache/hudi/blob/0fb8556b0d9274aef650a46bb82a8cf495d4450b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java#L158-L169
   you could set the config HOODIE_CONSUME_PENDING_COMMITS to true and try again.
   
   The Spark MOR snapshot read codepath goes to 
   https://github.com/apache/hudi/blob/a0dae41409a4f2d509aae1b16a4b509ec774c454/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java#L238-L240
   We should include the compaction request instant here as well.
   
   Do you mind having a try with this fix?
   
   The file listing code path of Spark/Hive/Flink is different now, which leads to this issue. We need to unify the file listing as a high-priority task. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-974789746


   filed a ticket https://issues.apache.org/jira/browse/HUDI-2816


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-976710786


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   * 477b14f66e26c4606b2c54f4af7193128e8611ee Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-978001083


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     }, {
       "hash" : "36e0b079c84a3bda2c8cecc07214b26259bf81be",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3684",
       "triggerID" : "36e0b079c84a3bda2c8cecc07214b26259bf81be",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 36e0b079c84a3bda2c8cecc07214b26259bf81be Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3684) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#discussion_r756471788



##########
File path: hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java
##########
@@ -303,8 +303,8 @@ private String getSourceOperatorName(String operatorName) {
     }
 
     HoodieTableFileSystemView fsView = new HoodieTableFileSystemView(metaClient,
-        metaClient.getActiveTimeline().getCommitsTimeline()
-            .filterCompletedInstants(), fileStatuses);
+        // file-slice after pending compaction-requested instant-time is also considered valid
+        metaClient.getCommitsAndCompactionTimeline().filterCompletedAndCompactionInstants(), fileStatuses);

Review comment:
       this has the effect of including `compaction.requested`  etc in the timeline passed to the fs view

##########
File path: hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java
##########
@@ -303,8 +303,8 @@ private String getSourceOperatorName(String operatorName) {
     }
 
     HoodieTableFileSystemView fsView = new HoodieTableFileSystemView(metaClient,
-        metaClient.getActiveTimeline().getCommitsTimeline()
-            .filterCompletedInstants(), fileStatuses);
+        // file-slice after pending compaction-requested instant-time is also considered valid
+        metaClient.getCommitsAndCompactionTimeline().filterCompletedAndCompactionInstants(), fileStatuses);

Review comment:
       `getCommitsAndCompactionTimeline()` is really `getCommitsOrCompactionTimeline()`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-973711647


   Thanks for the detailed response @mincwang , will dig into this over the weekend


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-974789158






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#discussion_r714617062



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -151,8 +151,9 @@ class MergeOnReadSnapshotRelation(val sqlContext: SQLContext,
       // Load files from the global paths if it has defined to be compatible with the original mode
       val inMemoryFileIndex = HoodieSparkUtils.createInMemoryFileIndex(sqlContext.sparkSession, globPaths.get)
       val fsView = new HoodieTableFileSystemView(metaClient,
-        metaClient.getActiveTimeline.getCommitsTimeline
-          .filterCompletedInstants, inMemoryFileIndex.allFiles().toArray)
+        // file-slice after pending compaction-requested instant-time is also considered valid
+        metaClient.getCommitsAndCompactionTimeline.filterCompletedAndCompactionInstants,
+        inMemoryFileIndex.allFiles().toArray)

Review comment:
       Hi @vinothchandar @nsivabalan, i need your help for this view, i dive into the code a little, and the line confused me: https://github.com/apache/hudi/blob/5515a0d319cbac835c65f6d21898ac1399d77ea3/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java#L443, and this line:
   https://github.com/apache/hudi/blob/5515a0d319cbac835c65f6d21898ac1399d77ea3/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java#L120,
   
   the point i'm confused at is how we can decide the log files with base commit time of a pending compaction action is committed successfully ? I see some code to compare the timestamp but that is not enough, some intermediate or corrupt files may also have the log files with pending compaction instant time as base commit time right ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-924695744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-978839006


   I'll verify it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-977957239


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     }, {
       "hash" : "36e0b079c84a3bda2c8cecc07214b26259bf81be",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3684",
       "triggerID" : "36e0b079c84a3bda2c8cecc07214b26259bf81be",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 477b14f66e26c4606b2c54f4af7193128e8611ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645) 
   * 36e0b079c84a3bda2c8cecc07214b26259bf81be Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3684) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang edited a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang edited a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-971599052


   > > Hi,everyone,I have used this patch, i used flink sql execute batch query and the results were as expected , but spark-sql still lost data when querying rt table. I want it can as release blocker.
   > 
   > Hi @mincwang , is that possible to reproduce the data lost in a unit test? Are you using hive rt view or Spark MOR relation?
   
   Thanks for @garyli1019  reply,I used the Spark MOR query, the Spark query loses data when the Flink streaming writes to HUDi and the compression scheduler is executed but the compression plan is not executed。
   ```sql
   -- FLINK SQL  with clause DDL
     'compaction.schedule.enabled' = 'true',
      'compaction.async.enabled' = 'false',                    
   --    'compaction.trigger.strategy' = 'time_elapsed',
   --   'compaction.delta_seconds' = '3600'  
   ```
   
   Hi @danny0405 ,Can you give we a more professional reply?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-971599052


   > > Hi,everyone,I have used this patch, i used flink sql execute batch query and the results were as expected , but spark-sql still lost data when querying rt table. I want it can as release blocker.
   > 
   > Hi @mincwang , is that possible to reproduce the data lost in a unit test? Are you using hive rt view or Spark MOR relation?
   
   Thanks for @garyli1019  reply,I used the Spark MOR query, the Spark query loses data when the Flink streaming writes to HUDi and the compression scheduler is executed but the compression plan is not executed。
   ```sql
   -- FLINK SQL  with clause DDL
      'compaction.async.enabled' = 'false',                    
   --    'compaction.trigger.strategy' = 'time_elapsed',
   --   'compaction.delta_seconds' = '3600'  
   ```
   
   Hi @danny0405 ,Can you give we a more professional reply?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang removed a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang removed a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-973717580






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#discussion_r756455804



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -151,8 +151,9 @@ class MergeOnReadSnapshotRelation(val sqlContext: SQLContext,
       // Load files from the global paths if it has defined to be compatible with the original mode
       val inMemoryFileIndex = HoodieSparkUtils.createInMemoryFileIndex(sqlContext.sparkSession, globPaths.get)
       val fsView = new HoodieTableFileSystemView(metaClient,
-        metaClient.getActiveTimeline.getCommitsTimeline
-          .filterCompletedInstants, inMemoryFileIndex.allFiles().toArray)
+        // file-slice after pending compaction-requested instant-time is also considered valid
+        metaClient.getCommitsAndCompactionTimeline.filterCompletedAndCompactionInstants,
+        inMemoryFileIndex.allFiles().toArray)

Review comment:
       We generally filter log blocks , not log files. i.e we would consider all log files written against the same base commit time and read through them to resolve




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-961588461


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-924695744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-925779838


   I ll take this up. cc @bvaradar in case he has time to take a peek as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-974790661


   > 
   
   
   
   > 
   
   Thanks for @garyli1019  reply. I will try this fix tomorrow。cc @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-974790661






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-976710786


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   * 477b14f66e26c4606b2c54f4af7193128e8611ee Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-978361209


   @mincwang did the patch resolve your spark and Hive issues? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-924695744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-966990973


   Hi,everyone,I have used this patch, i used flink sql execute batch query and the results were as expected  , but spark-sql still lost data when querying rt table. I want it can as release blocker.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-924695744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-976706367


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   * 477b14f66e26c4606b2c54f4af7193128e8611ee UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-976706367


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a7f6880207d27eb58b074b4c9ae009ce17592f9e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318) 
   * 477b14f66e26c4606b2c54f4af7193128e8611ee UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-976707038


   > 
   > Why the Spark MOR snapshot read codepath goes to `hudi-hadoop-mr`?It shouldn't be `hudi-spark`?
   
   @mincwang the original thought was to reuse file listing code for hive and spark, so I put it under hadoop-mr
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-977954895


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2318",
       "triggerID" : "a7f6880207d27eb58b074b4c9ae009ce17592f9e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645",
       "triggerID" : "477b14f66e26c4606b2c54f4af7193128e8611ee",
       "triggerType" : "PUSH"
     }, {
       "hash" : "36e0b079c84a3bda2c8cecc07214b26259bf81be",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "36e0b079c84a3bda2c8cecc07214b26259bf81be",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 477b14f66e26c4606b2c54f4af7193128e8611ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3645) 
   * 36e0b079c84a3bda2c8cecc07214b26259bf81be UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-973726038


   - **flink**
   Flink batch queries also seem to have problems when I manually perform the compression into the Parquet file ![image](https://user-images.githubusercontent.com/33626973/142564979-ef1816fa-2dcd-4765-8a36-ab9487bf800f.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-973696652


   Hi @garyli1019 ,i meet some error in try spark UT , so I provide an operation detail below:
   - mysql
   
   ```sql
   create table user(
       id   int not null primary key,
       name varchar(10) null
   );
   
   ```
   
   - flink job
   ```sql
   CREATE TABLE user_mysql(
    id INT ,
    name STRING,
    PRIMARY KEY(`id`) NOT ENFORCED
   )WITH (
    'connector' = 'mysql-cdc',
    'hostname' = '10.49.0.x',
    'port' = '3306',
    'username' = 'root',
    'password' = '',
    'database-name' = 'poc',
    'table-name' = 'user'
   );
   
   
   CREATE TABLE `user_hudi`
   WITH (
       'connector' = 'hudi',
       'table.type' = 'MERGE_ON_READ',
       'path' = 'hdfs:///hudi/debug/user',
       'index.state.ttl'='0',
       'index.global.enabled' = 'false',
       'write.tasks' = '1',
       'changelog.enabled' = 'true',
       'hive_sync.enable' = 'true',
       'hive_sync.metastore.uris' = 'thrift://10.49.2.x:7004,thrift://10.49.0.40:x',
       'hive_sync.mode' = 'hms',
       'hive_sync.db' = 'debug',
       'hive_sync.auto_create_db' = 'false',
       'hive_sync.table'= 'user',
       'hive_sync.username' = 'hadoop',
       'hive_sync.password' = '',
       'hive_sync.partition_extractor_class' = 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
       'hoodie.datasource.write.hive_style_partitioning' = 'true',
       'compaction.async.enabled' = 'false',                 ---- highlight
       'compaction.trigger.strategy' = 'time_elapsed',  ----- highlight
       'compaction.delta_seconds' = '120'                     ---- highlight
   ) LIKE `user_mysql`(
       EXCLUDING ALL
       INCLUDING CONSTRAINTS
   );
   
   INSERT INTO user_hudi SELECT * FROM user_mysql;
   ```
   now, we should start the flink task.
   
   then i will interval 30 seconds execute once DML for user table of mysql
   ```sql
   -- init insert into
   INSERT INTO user VALUES(1,'1');
   INSERT INTO user VALUES(2,'2');
   -- delete 30 seconds after
   DELETE FROM user WHERE id = 1;
   -- insert into new value 60 seconds after
   INSERT INTO user VALUES(3,'3');
   -- update  90 seconds after
   UPDATE user SET name='33' where id = 3;
   -- insert into new value 120 seconds after
   INSERT INTO user VALUES(4,'4');
   ```
   currently for mysql,user table of query results should below:
   
   ![image](https://user-images.githubusercontent.com/33626973/142558669-48329795-09a9-4c5f-b92d-d1418b93f47c.png)
   
   for flink batch query(apply the hudi-2480 patch),results below: 
   
   ![image](https://user-images.githubusercontent.com/33626973/142558708-f92f0e8b-a785-4afb-a7a3-880078c349ee.png)
   
   for flink streaming changlog  query (apply the hudi-2480 patch),results below:
   
   ![image](https://user-images.githubusercontent.com/33626973/142558800-2c76d287-62dc-4845-8fc1-5037be96ebae.png)
   
   for hive batch query,results below:
   ![image](https://user-images.githubusercontent.com/33626973/142559077-a0b83673-baaf-49d2-8e68-887213fb494d.png)
   
   for spark batch query,results below:
   ![image](https://user-images.githubusercontent.com/33626973/142559120-cfd4bf8b-f0df-403e-8969-910834d26e52.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#discussion_r714761395



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -151,8 +151,9 @@ class MergeOnReadSnapshotRelation(val sqlContext: SQLContext,
       // Load files from the global paths if it has defined to be compatible with the original mode
       val inMemoryFileIndex = HoodieSparkUtils.createInMemoryFileIndex(sqlContext.sparkSession, globPaths.get)
       val fsView = new HoodieTableFileSystemView(metaClient,
-        metaClient.getActiveTimeline.getCommitsTimeline
-          .filterCompletedInstants, inMemoryFileIndex.allFiles().toArray)
+        // file-slice after pending compaction-requested instant-time is also considered valid
+        metaClient.getCommitsAndCompactionTimeline.filterCompletedAndCompactionInstants,
+        inMemoryFileIndex.allFiles().toArray)

Review comment:
       yes there could be pending writes like that. let me grok this and get back to you




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org