You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/06 15:20:28 UTC

[GitHub] [hudi] nsivabalan opened a new pull request, #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

nsivabalan opened a new pull request, #5241:
URL: https://github.com/apache/hudi/pull/5241

   ## What is the purpose of the pull request
   
   - Even if point look ups are enabled for log record reader for metadata table, we do see an extra full scan is triggered. Triaged and root caused it to "read lazy" argument being always set to false. Likely these were written having FILES partition in mind where its always full scan. 
   
   ## Brief change log
   
   - Fixed deriving value for "read lazy" from force full scan config. If full scan is enabled, "read lazy" is disable. If "full scan" is disabled, "read lazy" is enabled. 
   - 
   
   ## Verify this pull request
   
   - manually verified the fix. 
   - And existing tests should cover in general
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5241:
URL: https://github.com/apache/hudi/pull/5241#issuecomment-1090439044

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869",
       "triggerID" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f952f1be4c1eed1100f6893cac1d8276fef554ff Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869) 
   * 5e90efba02542f69684c03431f57dd2f322871a0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5241:
URL: https://github.com/apache/hudi/pull/5241#issuecomment-1090401246

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f952f1be4c1eed1100f6893cac1d8276fef554ff UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5241:
URL: https://github.com/apache/hudi/pull/5241#issuecomment-1090404656

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869",
       "triggerID" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f952f1be4c1eed1100f6893cac1d8276fef554ff Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5241:
URL: https://github.com/apache/hudi/pull/5241#issuecomment-1090796874

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869",
       "triggerID" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7873",
       "triggerID" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b6dd143b27b0a24ae324799eb1827cdc2fad8939",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7878",
       "triggerID" : "b6dd143b27b0a24ae324799eb1827cdc2fad8939",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b6dd143b27b0a24ae324799eb1827cdc2fad8939 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7878) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5241:
URL: https://github.com/apache/hudi/pull/5241#issuecomment-1090690183

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869",
       "triggerID" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7873",
       "triggerID" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b6dd143b27b0a24ae324799eb1827cdc2fad8939",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7878",
       "triggerID" : "b6dd143b27b0a24ae324799eb1827cdc2fad8939",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5e90efba02542f69684c03431f57dd2f322871a0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7873) 
   * b6dd143b27b0a24ae324799eb1827cdc2fad8939 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7878) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #5241:
URL: https://github.com/apache/hudi/pull/5241#discussion_r844303357


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataMergedLogRecordReader.java:
##########
@@ -64,7 +64,7 @@ private HoodieMetadataMergedLogRecordReader(FileSystem fs, String basePath, Stri
                                               ExternalSpillableMap.DiskMapType diskMapType,
                                               boolean isBitCaskDiskMapCompressionEnabled,
                                               Option<InstantRange> instantRange, boolean enableFullScan) {
-    super(fs, basePath, logFilePaths, readerSchema, latestInstantTime, maxMemorySizeInBytes, false, false, bufferSize,
+    super(fs, basePath, logFilePaths, readerSchema, latestInstantTime, maxMemorySizeInBytes, true, false, bufferSize,

Review Comment:
   We should not couple those -- these configs control different aspects:
   
   1. `forceFullScan` (renamed it to make its semantic crystal clear) -- forces to read all records from the block
   2. `readBlocksLazily` -- controls whether we merge blocks eagerly as we read them or backward-pass when all of them are read.
   
   We should remove `readBlocksLazily` altogether actually and leave only backward-pass behavior, since the other alternatives will be yielding incorrect merge results (one example is HUDI-3342, another example is incorrect handling of deletes)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #5241:
URL: https://github.com/apache/hudi/pull/5241#issuecomment-1090437033

   @prashantwason : Can you review this patch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
codope commented on code in PR #5241:
URL: https://github.com/apache/hudi/pull/5241#discussion_r844140961


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataMergedLogRecordReader.java:
##########
@@ -64,7 +64,7 @@ private HoodieMetadataMergedLogRecordReader(FileSystem fs, String basePath, Stri
                                               ExternalSpillableMap.DiskMapType diskMapType,
                                               boolean isBitCaskDiskMapCompressionEnabled,
                                               Option<InstantRange> instantRange, boolean enableFullScan) {
-    super(fs, basePath, logFilePaths, readerSchema, latestInstantTime, maxMemorySizeInBytes, false, false, bufferSize,
+    super(fs, basePath, logFilePaths, readerSchema, latestInstantTime, maxMemorySizeInBytes, true, false, bufferSize,

Review Comment:
   Why not `!enableFullScan` instead of hard-coding to true?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5241:
URL: https://github.com/apache/hudi/pull/5241#issuecomment-1090468346

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869",
       "triggerID" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7873",
       "triggerID" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f952f1be4c1eed1100f6893cac1d8276fef554ff Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869) 
   * 5e90efba02542f69684c03431f57dd2f322871a0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7873) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5241:
URL: https://github.com/apache/hudi/pull/5241#issuecomment-1090687464

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869",
       "triggerID" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7873",
       "triggerID" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b6dd143b27b0a24ae324799eb1827cdc2fad8939",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b6dd143b27b0a24ae324799eb1827cdc2fad8939",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5e90efba02542f69684c03431f57dd2f322871a0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7873) 
   * b6dd143b27b0a24ae324799eb1827cdc2fad8939 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5241:
URL: https://github.com/apache/hudi/pull/5241#issuecomment-1090442344

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869",
       "triggerID" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f952f1be4c1eed1100f6893cac1d8276fef554ff Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869) 
   * 5e90efba02542f69684c03431f57dd2f322871a0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5241:
URL: https://github.com/apache/hudi/pull/5241#issuecomment-1090600280

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7869",
       "triggerID" : "f952f1be4c1eed1100f6893cac1d8276fef554ff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7873",
       "triggerID" : "5e90efba02542f69684c03431f57dd2f322871a0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5e90efba02542f69684c03431f57dd2f322871a0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7873) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #5241: [HUDI-3810] Fixing lazy read for metadata log record readers

Posted by GitBox <gi...@apache.org>.
nsivabalan merged PR #5241:
URL: https://github.com/apache/hudi/pull/5241


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org