You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "lvyanquan (Jira)" <ji...@apache.org> on 2023/04/05 12:32:00 UTC

[jira] [Updated] (HUDI-6041) add `properties` to Hudi Spark Procedures

     [ https://issues.apache.org/jira/browse/HUDI-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lvyanquan updated HUDI-6041:
----------------------------
    Description: 
We need to write extra properties to a HDFS file for [Bootstrap Procedure|https://hudi.apache.org/docs/next/procedures#run_bootstrap] and set `props_file_path`, which make it troublesome to call this procedure, like:
{code:java}
call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE', 
bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table', 
base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table', 
rowKey_field => 'id', partition_path_field => 'dt',
props_file_path => 'hdfs://ns1//tmp/tableProp.txt'); {code}
Or we can set those properties by session config, which means that we need to execute some `set` SQLs.


We can add a new parameter for procedure input named `properties`, add  collect key-value pairs for this input, like:
{code:java}
call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE', 
bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table', 
base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table', 
rowKey_field => 'id', partition_path_field => 'dt', 
properties => 'hoodie.datasource.write.hive_style_partitioning=true');  {code}
So that we don't need to put another file to HDFS

  was:we need to write properties to a hdfs file for [Bootstrap Procedure|https://hudi.apache.org/docs/next/procedures#run_bootstrap] and set `props_file_path` , it's troublesome to call this procedure.


> add `properties` to Hudi Spark Procedures
> -----------------------------------------
>
>                 Key: HUDI-6041
>                 URL: https://issues.apache.org/jira/browse/HUDI-6041
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: bootstrap, spark-sql
>            Reporter: lvyanquan
>            Priority: Major
>
> We need to write extra properties to a HDFS file for [Bootstrap Procedure|https://hudi.apache.org/docs/next/procedures#run_bootstrap] and set `props_file_path`, which make it troublesome to call this procedure, like:
> {code:java}
> call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE', 
> bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table', 
> base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table', 
> rowKey_field => 'id', partition_path_field => 'dt',
> props_file_path => 'hdfs://ns1//tmp/tableProp.txt'); {code}
> Or we can set those properties by session config, which means that we need to execute some `set` SQLs.
> We can add a new parameter for procedure input named `properties`, add  collect key-value pairs for this input, like:
> {code:java}
> call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE', 
> bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table', 
> base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table', 
> rowKey_field => 'id', partition_path_field => 'dt', 
> properties => 'hoodie.datasource.write.hive_style_partitioning=true');  {code}
> So that we don't need to put another file to HDFS



--
This message was sent by Atlassian Jira
(v8.20.10#820010)