You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/05/04 09:14:00 UTC

[jira] [Updated] (HUDI-4001) "hoodie.datasource.write.operation" from table config should not be used as write operation

     [ https://issues.apache.org/jira/browse/HUDI-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HUDI-4001:
---------------------------------
    Labels: pull-request-available  (was: )

> "hoodie.datasource.write.operation" from table config should not be used as write operation
> -------------------------------------------------------------------------------------------
>
>                 Key: HUDI-4001
>                 URL: https://issues.apache.org/jira/browse/HUDI-4001
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: spark-sql
>            Reporter: Ethan Guo
>            Assignee: 董可伦
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.11.1
>
>
> [https://github.com/apache/hudi/issues/5248]
> when I use spark sql create table and set {*}hoodie.datasource.write.operation{*}=upsert.
> delete sql （like pr [#5215|https://github.com/apache/hudi/pull/5215] ）, insert overwrite sql etc will still use *hoodie.datasource.write.operation* to update record, not delete, insert_overwrite etc.
> eg:
> create a table and set hoodie.datasource.write.operation upsert
> when I use sql to delete, the delete operation key will be overwrite by hoodie.datasource.write.operation from table or env, *OPERATION.key -> DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL* will not effect, overwrite to *upsert*
> withSparkConf(sparkSession, hoodieCatalogTable.catalogProperties) { Map( "path" -> path, RECORDKEY_FIELD.key -> hoodieCatalogTable.primaryKeys.mkString(","), TBL_NAME.key -> tableConfig.getTableName, HIVE_STYLE_PARTITIONING.key -> tableConfig.getHiveStylePartitioningEnable, URL_ENCODE_PARTITIONING.key -> tableConfig.getUrlEncodePartitioning, KEYGENERATOR_CLASS_NAME.key -> classOf[SqlKeyGenerator].getCanonicalName, SqlKeyGenerator.ORIGIN_KEYGEN_CLASS_NAME -> tableConfig.getKeyGeneratorClassName, OPERATION.key -> DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL, PARTITIONPATH_FIELD.key -> tableConfig.getPartitionFieldProp, HiveSyncConfig.HIVE_SYNC_MODE.key -> HiveSyncMode.HMS.name(), HiveSyncConfig.HIVE_SUPPORT_TIMESTAMP_TYPE.key -> "true", HoodieWriteConfig.DELETE_PARALLELISM_VALUE.key -> "200", SqlKeyGenerator.PARTITION_SCHEMA -> partitionSchema.toDDL ) }
> so, when use sql, what about don't write it to hoodie.properties, confine it when sql check, command generated itself in runtime.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)