You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Sagar Sumit (Jira)" <ji...@apache.org> on 2022/06/06 05:42:00 UTC

[jira] [Commented] (HUDI-4184) Creating external table in Spark SQL modifies "hoodie.properties"

    [ https://issues.apache.org/jira/browse/HUDI-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550305#comment-17550305 ] 

Sagar Sumit commented on HUDI-4184:
-----------------------------------

It happens when HoodieCatalogTable initializes an existing Hudi table. It is attempting to set the table schema. See [https://github.com/apache/hudi/blob/22c45a7704cf4d5ec6fb56ee7cc1bf17d826315d/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala#L176]

Btw, it would happen even while creating a new table (check #L186 in the same class).

This is used as a fallback (in TableSchemaResolver) whenever schema from commit metadata is not present. 

First, we need to reconsider whether schema should be part of hoodie.properties given that when we turn on schema evolution by default in near future, it will already be part of internal schema metadata.

Second, even if we keep it in hoodie.properties, it should be added when the Hudi table is originally created irrespective of client (spark datasource, spark sql, etc).

cc: [~mengtao] [~xushiyan] [~alexey.kudinkin]

> Creating external table in Spark SQL modifies "hoodie.properties"
> -----------------------------------------------------------------
>
>                 Key: HUDI-4184
>                 URL: https://issues.apache.org/jira/browse/HUDI-4184
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Alexey Kudinkin
>            Assignee: Sagar Sumit
>            Priority: Critical
>
> My setup was like following:
>  # There's a table existing in one AWS account
>  # I'm trying to access that table from Spark SQL from _another_ AWS account that only has Read permissions to the bucket with the table.
>  # Now when issuing "CREATE TABLE" Spark SQL command it fails b/c Hudi tries to modify "hoodie.properties" file for whatever reason, even though i'm not modifying the table and just trying to create table in the catalog.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)