You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "lvyanquan (Jira)" <ji...@apache.org> on 2023/04/05 12:32:00 UTC
[jira] [Updated] (HUDI-6041) add `properties` to Hudi Spark Procedures
[ https://issues.apache.org/jira/browse/HUDI-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lvyanquan updated HUDI-6041:
----------------------------
Description:
We need to write extra properties to a HDFS file for [Bootstrap Procedure|https://hudi.apache.org/docs/next/procedures#run_bootstrap] and set `props_file_path`, which make it troublesome to call this procedure, like:
{code:java}
call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE',
bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table',
base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table',
rowKey_field => 'id', partition_path_field => 'dt',
props_file_path => 'hdfs://ns1//tmp/tableProp.txt'); {code}
Or we can set those properties by session config, which means that we need to execute some `set` SQLs.
We can add a new parameter for procedure input named `properties`, add collect key-value pairs for this input, like:
{code:java}
call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE',
bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table',
base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table',
rowKey_field => 'id', partition_path_field => 'dt',
properties => 'hoodie.datasource.write.hive_style_partitioning=true'); {code}
So that we don't need to put another file to HDFS
was:we need to write properties to a hdfs file for [Bootstrap Procedure|https://hudi.apache.org/docs/next/procedures#run_bootstrap] and set `props_file_path` , it's troublesome to call this procedure.
> add `properties` to Hudi Spark Procedures
> -----------------------------------------
>
> Key: HUDI-6041
> URL: https://issues.apache.org/jira/browse/HUDI-6041
> Project: Apache Hudi
> Issue Type: Improvement
> Components: bootstrap, spark-sql
> Reporter: lvyanquan
> Priority: Major
>
> We need to write extra properties to a HDFS file for [Bootstrap Procedure|https://hudi.apache.org/docs/next/procedures#run_bootstrap] and set `props_file_path`, which make it troublesome to call this procedure, like:
> {code:java}
> call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE',
> bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table',
> base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table',
> rowKey_field => 'id', partition_path_field => 'dt',
> props_file_path => 'hdfs://ns1//tmp/tableProp.txt'); {code}
> Or we can set those properties by session config, which means that we need to execute some `set` SQLs.
> We can add a new parameter for procedure input named `properties`, add collect key-value pairs for this input, like:
> {code:java}
> call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE',
> bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table',
> base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table',
> rowKey_field => 'id', partition_path_field => 'dt',
> properties => 'hoodie.datasource.write.hive_style_partitioning=true'); {code}
> So that we don't need to put another file to HDFS
--
This message was sent by Atlassian Jira
(v8.20.10#820010)