You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/18 21:55:16 UTC

[GitHub] [hudi] alexeykudinkin opened a new pull request, #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

alexeykudinkin opened a new pull request, #5352:
URL: https://github.com/apache/hudi/pull/5352

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Spark unfortunately predicates some of its optimization `Rule`s (and some other handling) on usage of `HadoopFsRelation` which leads to those optimizations not being applied when we rely on our custom `Relation` impls. 
   
   To work this around for the 0.11, we fallback to `HadoopFsRelation` in cases when it's feasible to do so.
   
   ## Brief change log
   
    - Adding `toHadoopFsRelation` method for `BaseFileOnlyRelation`
    - Fallback to `HadoopFsRelation` for non-Schema Evolution use-cases
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5352:
URL: https://github.com/apache/hudi/pull/5352#discussion_r852466064


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -206,4 +208,32 @@ class DefaultSource extends RelationProvider
                             parameters: Map[String, String]): Source = {
     new HoodieStreamSource(sqlContext, metadataPath, schema, parameters)
   }
+
+  private def resolveBaseFileOnlyRelation(sqlContext: SQLContext,
+                                          globPaths: Seq[Path],
+                                          userSchema: Option[StructType],
+                                          metaClient: HoodieTableMetaClient,
+                                          optParams: Map[String, String]) = {
+    val baseRelation = new BaseFileOnlyRelation(sqlContext, metaClient, optParams, userSchema, globPaths)
+    val enableSchemaOnRead: Boolean = !tryFetchInternalSchema(metaClient).isEmptySchema
+
+    // NOTE: We fallback to [[HadoopFsRelation]] in all of the cases except ones requiring usage of
+    //       [[BaseFileOnlyRelation]] to function correctly. This is necessary to maintain performance parity w/
+    //       vanilla Spark, since some of the Spark optimizations are predicated on the using of [[HadoopFsRelation]].
+    //
+    //       You can check out HUDI-3896 for more details
+    if (enableSchemaOnRead) {
+      baseRelation
+    } else {
+      baseRelation.toHadoopFsRelation
+    }
+  }
+
+  private def tryFetchInternalSchema(metaClient: HoodieTableMetaClient) =

Review Comment:
   Is schema evolution flippable for a given table ? I mean, can someone enable for few commits and disable it and re-enable it back after sometime? if not, we might need to add it as tableConfig(enabling schema evolution). and if we already have one, we should rely on it rather than parsing the commit metadata everytime? we can take it as a follow up. just curious. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101803323

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * de28a2a5f5a3d0ae73ed832104ee10c7a09513f5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101896089

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119",
       "triggerID" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * de28a2a5f5a3d0ae73ed832104ee10c7a09513f5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115) 
   * 6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1102379779

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119",
       "triggerID" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8130",
       "triggerID" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * b975d3295cd534ac04c7ee58bb7961fd9971597d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8130) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5352:
URL: https://github.com/apache/hudi/pull/5352#discussion_r852493194


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -206,4 +208,32 @@ class DefaultSource extends RelationProvider
                             parameters: Map[String, String]): Source = {
     new HoodieStreamSource(sqlContext, metadataPath, schema, parameters)
   }
+
+  private def resolveBaseFileOnlyRelation(sqlContext: SQLContext,
+                                          globPaths: Seq[Path],
+                                          userSchema: Option[StructType],
+                                          metaClient: HoodieTableMetaClient,
+                                          optParams: Map[String, String]) = {
+    val baseRelation = new BaseFileOnlyRelation(sqlContext, metaClient, optParams, userSchema, globPaths)
+    val enableSchemaOnRead: Boolean = !tryFetchInternalSchema(metaClient).isEmptySchema
+
+    // NOTE: We fallback to [[HadoopFsRelation]] in all of the cases except ones requiring usage of
+    //       [[BaseFileOnlyRelation]] to function correctly. This is necessary to maintain performance parity w/
+    //       vanilla Spark, since some of the Spark optimizations are predicated on the using of [[HadoopFsRelation]].
+    //
+    //       You can check out HUDI-3896 for more details
+    if (enableSchemaOnRead) {
+      baseRelation
+    } else {
+      baseRelation.toHadoopFsRelation
+    }
+  }
+
+  private def tryFetchInternalSchema(metaClient: HoodieTableMetaClient) =

Review Comment:
   thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101913209

   @alexeykudinkin 
    if we support prune nest schema, can we avoid this performance  problem
   LGTM
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan merged pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
xushiyan merged PR #5352:
URL: https://github.com/apache/hudi/pull/5352


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101917354

   @xiarixiaoyao it will most likely solve this particular issue, but we'll continue to be exposed to this becoming a problem again until we upstream the real fix and make `HadoopFsRelation` _extensible_ in Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101929512

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119",
       "triggerID" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * 6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1102654401

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119",
       "triggerID" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8130",
       "triggerID" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cabf2cf679534f15b22f3b5daa77a75987667fa5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8141",
       "triggerID" : "cabf2cf679534f15b22f3b5daa77a75987667fa5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * b975d3295cd534ac04c7ee58bb7961fd9971597d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8130) 
   * cabf2cf679534f15b22f3b5daa77a75987667fa5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8141) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1102743626

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119",
       "triggerID" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8130",
       "triggerID" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cabf2cf679534f15b22f3b5daa77a75987667fa5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8141",
       "triggerID" : "cabf2cf679534f15b22f3b5daa77a75987667fa5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * cabf2cf679534f15b22f3b5daa77a75987667fa5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8141) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101801414

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] vinothchandar commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101833648

   @xiarixiaoyao Can you please skim this PR as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101874886

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * de28a2a5f5a3d0ae73ed832104ee10c7a09513f5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115) 
   * 6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1102599527

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119",
       "triggerID" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8130",
       "triggerID" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cabf2cf679534f15b22f3b5daa77a75987667fa5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cabf2cf679534f15b22f3b5daa77a75987667fa5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * b975d3295cd534ac04c7ee58bb7961fd9971597d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8130) 
   * cabf2cf679534f15b22f3b5daa77a75987667fa5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101952743

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119",
       "triggerID" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * 6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119) 
   * b975d3295cd534ac04c7ee58bb7961fd9971597d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101805143

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * de28a2a5f5a3d0ae73ed832104ee10c7a09513f5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] vinothchandar commented on a diff in pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on code in PR #5352:
URL: https://github.com/apache/hudi/pull/5352#discussion_r852467806


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -206,4 +208,32 @@ class DefaultSource extends RelationProvider
                             parameters: Map[String, String]): Source = {
     new HoodieStreamSource(sqlContext, metadataPath, schema, parameters)
   }
+
+  private def resolveBaseFileOnlyRelation(sqlContext: SQLContext,
+                                          globPaths: Seq[Path],
+                                          userSchema: Option[StructType],
+                                          metaClient: HoodieTableMetaClient,
+                                          optParams: Map[String, String]) = {
+    val baseRelation = new BaseFileOnlyRelation(sqlContext, metaClient, optParams, userSchema, globPaths)
+    val enableSchemaOnRead: Boolean = !tryFetchInternalSchema(metaClient).isEmptySchema
+
+    // NOTE: We fallback to [[HadoopFsRelation]] in all of the cases except ones requiring usage of
+    //       [[BaseFileOnlyRelation]] to function correctly. This is necessary to maintain performance parity w/
+    //       vanilla Spark, since some of the Spark optimizations are predicated on the using of [[HadoopFsRelation]].
+    //
+    //       You can check out HUDI-3896 for more details
+    if (enableSchemaOnRead) {
+      baseRelation
+    } else {
+      baseRelation.toHadoopFsRelation
+    }
+  }
+
+  private def tryFetchInternalSchema(metaClient: HoodieTableMetaClient) =

Review Comment:
   if we write with evolution on, and then turn off, then table may not readable. So this does not apply here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101846425

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * de28a2a5f5a3d0ae73ed832104ee10c7a09513f5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5352:
URL: https://github.com/apache/hudi/pull/5352#issuecomment-1101974409

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "57f622f643f7c623129636f8e5000ffe014b0c0b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8115",
       "triggerID" : "de28a2a5f5a3d0ae73ed832104ee10c7a09513f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119",
       "triggerID" : "6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8130",
       "triggerID" : "b975d3295cd534ac04c7ee58bb7961fd9971597d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN
   * 6f2b0129ba44ba17b6bd1ba552e9724a62f8e96c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8119) 
   * b975d3295cd534ac04c7ee58bb7961fd9971597d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8130) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org