You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/28 22:52:42 UTC

[GitHub] [hudi] yihua opened a new pull request, #7088: [HUDI-] Add the feature flag back to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

yihua opened a new pull request, #7088:
URL: https://github.com/apache/hudi/pull/7088

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1354667679

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "DELETED",
       "url" : "TBD",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13778",
       "triggerID" : "cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1fa8218eeffdede8c3f6f743799a45e0528d4403",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13795",
       "triggerID" : "1fa8218eeffdede8c3f6f743799a45e0528d4403",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1fa8218eeffdede8c3f6f743799a45e0528d4403 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13795) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #7088:
URL: https://github.com/apache/hudi/pull/7088#discussion_r1050267183


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -91,7 +90,7 @@ object DataSourceReadOptions {
 
   val ENABLE_HOODIE_FILE_INDEX: ConfigProperty[Boolean] = ConfigProperty
     .key("hoodie.file.index.enable")
-    .defaultValue(true)
+    .defaultValue(false)

Review Comment:
   Reverted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #7088:
URL: https://github.com/apache/hudi/pull/7088#discussion_r1050368785


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala:
##########
@@ -134,7 +135,9 @@ class BaseFileOnlyRelation(sqlContext: SQLContext,
    *       rule; you can find more details in HUDI-3896)
    */
   def toHadoopFsRelation: HadoopFsRelation = {
-    if (globPaths.isEmpty) {
+    val enableFileIndex = optParams.get(ENABLE_HOODIE_FILE_INDEX.key)

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #7088:
URL: https://github.com/apache/hudi/pull/7088#discussion_r1050274498


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala:
##########
@@ -134,7 +135,9 @@ class BaseFileOnlyRelation(sqlContext: SQLContext,
    *       rule; you can find more details in HUDI-3896)
    */
   def toHadoopFsRelation: HadoopFsRelation = {
-    if (globPaths.isEmpty) {
+    val enableFileIndex = optParams.get(ENABLE_HOODIE_FILE_INDEX.key)

Review Comment:
   Let's reuse `getBooleanConfigValue` it here so that this could be set from SQLConf



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1346974265

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "CANCELED",
       "url" : "TBD",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * aa95345864c5b405b6e307512afa21c912dc78ed Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1354032837

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "DELETED",
       "url" : "TBD",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13778",
       "triggerID" : "cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aa95345864c5b405b6e307512afa21c912dc78ed Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651) 
   * cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13778) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] sachinmn2010 commented on pull request #7088: [HUDI-5104] Add feature flag to disable `HoodieFileIndex` and fall back to `HoodieROTablePathFilter`

Posted by "sachinmn2010 (via GitHub)" <gi...@apache.org>.
sachinmn2010 commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1438401766

   @yihua  I was also facing the issue HUDI-5092 while reading hudi table from databricks using hudi bundle 0.11.1. I tried using the same databricks version you mentioned above with hudi 12.2, but its still failing, could you please help me with it? 
   
    Databricks Runtime : `11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)`
   Hudi Spark bundle : `org.apache.hudi:hudi-spark3.3-bundle_2.12:0.12.2`
   
   Error :
   ```
   Py4JJavaError: An error occurred while calling o776.showString.
   : java.lang.NoSuchMethodError: org.apache.spark.sql.internal.SQLConf.parquetFilterPushDownStringStartWith()Z
   	at org.apache.spark.sql.execution.datasources.parquet.Spark32PlusHoodieParquetFileFormat.buildReaderWithPartitionValues(Spark32PlusHoodieParquetFileFormat.scala:129)
   	at org.apache.spark.sql.execution.datasources.parquet.HoodieParquetFileFormat.buildReaderWithPartitionValues(HoodieParquetFileFormat.scala:50)
   	at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:1914)
   	at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:1895)
   	at org.apache.spark.sql.execution.FileSourceScanExec.doExecuteColumnar(DataSourceScanExec.scala:1970)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:254)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:271)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:267)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1346875642

   > Hi @yihua Is it possible for you to complete the PR? Faced the same issue and we are blocked by this. We would be grateful.
   
   Hi @Korzi Sorry for the delay.  I'm working on this PR today.  Hope to land it before the 0.12.2 release code freeze.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1346880004

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1354230033

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "DELETED",
       "url" : "TBD",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13778",
       "triggerID" : "cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1fa8218eeffdede8c3f6f743799a45e0528d4403",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13795",
       "triggerID" : "1fa8218eeffdede8c3f6f743799a45e0528d4403",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13778) 
   * 1fa8218eeffdede8c3f6f743799a45e0528d4403 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13795) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #7088:
URL: https://github.com/apache/hudi/pull/7088#discussion_r1008582378


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -91,7 +90,7 @@ object DataSourceReadOptions {
 
   val ENABLE_HOODIE_FILE_INDEX: ConfigProperty[Boolean] = ConfigProperty
     .key("hoodie.file.index.enable")
-    .defaultValue(true)
+    .defaultValue(false)

Review Comment:
   @yihua let's make sure we at least run TestCOW*, TestMOR* tables w/ this config off as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1295628313

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2c6b27e41c5752ed278365a553c4e21139b8c276 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1354223045

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "DELETED",
       "url" : "TBD",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13778",
       "triggerID" : "cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1fa8218eeffdede8c3f6f743799a45e0528d4403",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1fa8218eeffdede8c3f6f743799a45e0528d4403",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13778) 
   * 1fa8218eeffdede8c3f6f743799a45e0528d4403 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1295620836

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2c6b27e41c5752ed278365a553c4e21139b8c276 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] MarcBarreiro commented on pull request #7088: [HUDI-5104] Add feature flag to disable `HoodieFileIndex` and fall back to `HoodieROTablePathFilter`

Posted by "MarcBarreiro (via GitHub)" <gi...@apache.org>.
MarcBarreiro commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1653009601

   Is `'hoodie.file.index.enable': 'false'` the way to go with PySpark and Azure Databricks? While I don't get the error 
   
   `NoSuchMethodError: org.apache.spark.sql.execution.datasources.FileStatusCache.putLeafFiles(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/fs/FileStatus;)V`
   
   I get an empty table when reading. I need to use the glob syntax to recover the data: 
   
   ```
   hudi_read_options = {
       'hoodie.datasource.read.partitionpath.field': "partitionpath",
       'hoodie.file.index.enable': 'false',
   }
   df = spark.read.format("hudi").options(**hudi_read_options).load(basePath + "/*/*/*")
   ```
   
   Is this behaviour expected when `hudi built-in FileIndex` is disabled?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #7088:
URL: https://github.com/apache/hudi/pull/7088#discussion_r1020534097


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -91,7 +90,7 @@ object DataSourceReadOptions {
 
   val ENABLE_HOODIE_FILE_INDEX: ConfigProperty[Boolean] = ConfigProperty
     .key("hoodie.file.index.enable")
-    .defaultValue(true)
+    .defaultValue(false)

Review Comment:
   Don't forget to revert this



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala:
##########
@@ -134,7 +135,9 @@ class BaseFileOnlyRelation(sqlContext: SQLContext,
    *       rule; you can find more details in HUDI-3896)
    */
   def toHadoopFsRelation: HadoopFsRelation = {
-    if (globPaths.isEmpty) {
+    val enableFileIndex = optParams.get(ENABLE_HOODIE_FILE_INDEX.key)
+      .map(_.toBoolean).getOrElse(ENABLE_HOODIE_FILE_INDEX.defaultValue)
+    if (globPaths.isEmpty && enableFileIndex) {

Review Comment:
   nit: let's flip the order, to stress relative importance



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1346963583

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "",
       "status" : "CANCELED",
       "url" : "TBD",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * aa95345864c5b405b6e307512afa21c912dc78ed UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1354203903

   > @yihua LGTM
   > 
   > I think we need to trim down which tests we extend with the filter case -- we're essentially doubling the number of tests now for most of these cases, which i don't think is a good idea based on where we're at w/ test runtime currently.
   > 
   > I'd suggest instead we strategically pick only the most relevant test in TestCOW*/MOR* suites and extend only them (for ex, partition pruning related ones, key-generator ones, etc)
   
   @alexeykudinkin Makes sense.  I didn't extend every test case.  As discussed, after revisiting the modified tests, we trim down the new tests to ones that can be affected by file listing only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1295681437

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2c6b27e41c5752ed278365a553c4e21139b8c276 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Korzi commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
Korzi commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1330556537

   Hi @yihua 
   Is it possible for you to complete the PR? Faced the same issue and we are blocked by this. We would be grateful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #7088:
URL: https://github.com/apache/hudi/pull/7088#discussion_r1050074609


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala:
##########
@@ -134,7 +135,9 @@ class BaseFileOnlyRelation(sqlContext: SQLContext,
    *       rule; you can find more details in HUDI-3896)
    */
   def toHadoopFsRelation: HadoopFsRelation = {
-    if (globPaths.isEmpty) {
+    val enableFileIndex = optParams.get(ENABLE_HOODIE_FILE_INDEX.key)
+      .map(_.toBoolean).getOrElse(ENABLE_HOODIE_FILE_INDEX.defaultValue)
+    if (globPaths.isEmpty && enableFileIndex) {

Review Comment:
   Makes sense.  Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] lubomir-angelov commented on pull request #7088: [HUDI-5104] Add feature flag to disable `HoodieFileIndex` and fall back to `HoodieROTablePathFilter`

Posted by "lubomir-angelov (via GitHub)" <gi...@apache.org>.
lubomir-angelov commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1517360063

   following this thread as we are facing the same issue with az databricks LTS 11.3 and 12.2 and hudi spark 3.3 0.12.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1354025975

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "DELETED",
       "url" : "TBD",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aa95345864c5b405b6e307512afa21c912dc78ed Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651) 
   * cc2106bc69cdc72e44d9f47bd720c80b5ac93f9b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1354021048

   I also created a ticket to improve `TestCOWDataSource` and `TestMORDataSource` later on: [HUDI-5397](https://issues.apache.org/jira/browse/HUDI-5397)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope merged pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
codope merged PR #7088:
URL: https://github.com/apache/hudi/pull/7088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7088: [HUDI-5104] Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7088:
URL: https://github.com/apache/hudi/pull/7088#issuecomment-1347323241

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12657",
       "triggerID" : "2c6b27e41c5752ed278365a553c4e21139b8c276",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651",
       "triggerID" : "aa95345864c5b405b6e307512afa21c912dc78ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "DELETED",
       "url" : "TBD",
       "triggerID" : "1346880004",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * aa95345864c5b405b6e307512afa21c912dc78ed Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13651) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #7088: [HUDI-] Add the feature flag back to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #7088:
URL: https://github.com/apache/hudi/pull/7088#discussion_r1008567676


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -91,7 +90,7 @@ object DataSourceReadOptions {
 
   val ENABLE_HOODIE_FILE_INDEX: ConfigProperty[Boolean] = ConfigProperty
     .key("hoodie.file.index.enable")
-    .defaultValue(true)
+    .defaultValue(false)

Review Comment:
   This is to run the CI on `hoodie.file.index.enable=false` to ensure the relation functions without the `HoodieFileIndex`.  This line of change is going to be reverted before merging. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org