You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/11 06:09:35 UTC

[GitHub] [hudi] yihua opened a new pull request, #5840: [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table

yihua opened a new pull request, #5840:
URL: https://github.com/apache/hudi/pull/5840

   ## What is the purpose of the pull request
   
   When reading the metadata table directly with the metadata table path in Spark, i.e., `spark.read.format("hudi").load("<base_path>/.hoodie/metadata/").show`, it throws `NullPointerException` from `getLogRecordScanner`:
   ```
   Caused by: java.lang.NullPointerException
      at org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:484)
      at org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:342)
      at org.apache.hudi.HoodieMergeOnReadRDD$LogFileIterator.<init>(HoodieMergeOnReadRDD.scala:173)
      at org.apache.hudi.HoodieMergeOnReadRDD$RecordMergingFileIterator.<init>(HoodieMergeOnReadRDD.scala:252)
      at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:101)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
   ```
   The root cause is that, in `HoodieMergeOnReadRDD.scanLog`, `tableState.metadataConfig` does not have `hoodie.metadata.enable` set to `true` by default.  Thus, `HoodieBackedTableMetadata` instantiated based on the config does not properly initialize the `metadataMetaClient`, causing NPE.  In this use case, given that user explicitly specifies metadata table path for reading, the `hoodie.metadata.enable` should be overwritten to `true` for proper read behavior.
   
   ## Brief change log
   
     - In `HoodieMergeOnReadRDD.scanLog`, rebuild the `HoodieMetadataConfig` with `hoodie.metadata.enable` set to `true`
     - Fix `TestMetadataTableWithSparkDataSource` to follow the common pattern for reading metadata table, i.e., `spark.read.format("hudi").load("<base_path>/.hoodie/metadata/")`, without setting any options
    
   ## Verify this pull request
   
   Before this PR, `TestMetadataTableWithSparkDataSource` fails with `spark.read.format("hudi").load("<base_path>/.hoodie/metadata/")`.  After this PR, the test class passes.  The spark read of metadata table is also verified with Spark 2.4.4, 3.1.3, and 3.2.1 locally and on S3.
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua merged pull request #5840: [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table

Posted by GitBox <gi...@apache.org>.
yihua merged PR #5840:
URL: https://github.com/apache/hudi/pull/5840


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5840: [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5840:
URL: https://github.com/apache/hudi/pull/5840#issuecomment-1152885586

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9226",
       "triggerID" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a102f13d67a4d76525df0fc74b1a6759263bc7e4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9227",
       "triggerID" : "a102f13d67a4d76525df0fc74b1a6759263bc7e4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a102f13d67a4d76525df0fc74b1a6759263bc7e4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9227) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #5840: [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #5840:
URL: https://github.com/apache/hudi/pull/5840#issuecomment-1152878045

   @alexeykudinkin could you also review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5840: [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5840:
URL: https://github.com/apache/hudi/pull/5840#issuecomment-1152872956

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9226",
       "triggerID" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a102f13d67a4d76525df0fc74b1a6759263bc7e4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9227",
       "triggerID" : "a102f13d67a4d76525df0fc74b1a6759263bc7e4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9226) 
   * a102f13d67a4d76525df0fc74b1a6759263bc7e4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9227) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5840: [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5840:
URL: https://github.com/apache/hudi/pull/5840#issuecomment-1152869982

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9226",
       "triggerID" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9226) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5840: [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5840:
URL: https://github.com/apache/hudi/pull/5840#issuecomment-1152877877

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9226",
       "triggerID" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a102f13d67a4d76525df0fc74b1a6759263bc7e4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9227",
       "triggerID" : "a102f13d67a4d76525df0fc74b1a6759263bc7e4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9226) 
   * a102f13d67a4d76525df0fc74b1a6759263bc7e4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9227) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5840: [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5840:
URL: https://github.com/apache/hudi/pull/5840#issuecomment-1152865150

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5840: [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5840:
URL: https://github.com/apache/hudi/pull/5840#issuecomment-1152870468

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9226",
       "triggerID" : "db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a102f13d67a4d76525df0fc74b1a6759263bc7e4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a102f13d67a4d76525df0fc74b1a6759263bc7e4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * db97f790f64eaa405b15a06b9ab8a3bbdf5aa7e8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9226) 
   * a102f13d67a4d76525df0fc74b1a6759263bc7e4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org