You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "weiming (Jira)" <ji...@apache.org> on 2022/10/14 11:03:00 UTC
[jira] [Created] (HUDI-5031) Hudi merge into creates empty partition files when the source table has partitions and the target table does not
weiming created HUDI-5031:
-----------------------------
Summary: Hudi merge into creates empty partition files when the source table has partitions and the target table does not
Key: HUDI-5031
URL: https://issues.apache.org/jira/browse/HUDI-5031
Project: Apache Hudi
Issue Type: Bug
Components: writer-core
Environment: hudi:release-0.11.0
spark: 3.2.1
Reporter: weiming
Fix For: 0.11.0
{{{}-{}}}{{{}-source{}}} table
{{create table hudi_test_wm_mor_01 (}}
{{ }}{{id int,}}
{{ }}{{name string,}}
{{ }}{{price double,}}
{{ }}{{ts bigint,}}
{{ }}{{dt string}}
{{) using hudi}}
{{tblproperties (}}
{{ }}{{type}} {{= }}{{{}'mor'{}}}{{{},{}}}
{{ }}{{primaryKey = }}{{{}'id'{}}}{{{},{}}}
{{ }}{{preCombineField = }}{{'ts'}}
{{)}}
{{partitioned by (dt);}}
{{{}-{}}}{{{}-target{}}} table
{{create table hudi_test_wm_mor_02 (}}
{{ }}{{id int,}}
{{ }}{{name string,}}
{{ }}{{price double,}}
{{ }}{{ts bigint,}}
{{ }}{{dt string}}
{{) using hudi}}
{{tblproperties (}}
{{ }}{{type}} {{= }}{{{}'mor'{}}}{{{},{}}}
{{ }}{{primaryKey = }}{{{}'id'{}}}{{{},{}}}
{{ }}{{preCombineField = }}{{'ts'}}
{{)}}
{{partitioned by (dt);}}
-- insert some data
{{{}insert into hudi_test_wm_mor_01 (id,name,price,ts,dt) values (12,{}}}{{{}'a12'{}}}{{{},23.234,1648871782,{}}}{{{}'2021-12-11'{}}}{{{});{}}}
{{{}insert into hudi_test_wm_mor_01 (id,name,price,ts,dt) values (13,{}}}{{{}'a13'{}}}{{{},24.234,1648871783,{}}}{{{}'2021-12-12'{}}}{{{});{}}}
{{{}insert into hudi_test_wm_mor_01 (id,name,price,ts,dt) values (14,{}}}{{{}'a14'{}}}{{{},25.234,1648871784,{}}}{{{}'2021-12-13'{}}}{{{});{}}}
{{{}insert into hudi_test_wm_mor_01 (id,name,price,ts,dt) values (15,{}}}{{{}'a15'{}}}{{{},26.234,1648871785,{}}}{{{}'2021-12-14'{}}}{{{});{}}}
{{{}insert into hudi_test_wm_mor_01 (id,name,price,ts,dt) values (16,{}}}{{{}'a16'{}}}{{{},27.234,1648871786,{}}}{{{}'2021-12-15'{}}}{{{});{}}}
{{{}{}}}{{{}{}}}
{{{}insert into hudi_test_wm_mor_02 (id,name,price,ts,dt) values (12,{}}}{{{}'target12'{}}}{{{},88.1,1648871782,{}}}{{{}'2021-12-11'{}}}{{{});{}}}
{{{}insert into hudi_test_wm_mor_02 (id,name,price,ts,dt) values (13,{}}}{{{}'target13'{}}}{{{},89.1,1648871783,{}}}{{{}'2021-12-12'{}}}{{{});{}}}
--merge operation
{{merge into hudi_test_wm_mor_02 h0}}
{{using (}}
{{ }}{{select}} {{id, name, price, ts, dt from hudi_test_wm_mor_01}}
{{ }}{{) s0}}
{{ }}{{on h0.id = s0.id and h0.dt = s0.dt}}
{{ }}{{when matched then update }}{{set}} {{* ;}}
Description:
After the merge sql executes, five partitions are created in the target table (2021-12-11, 2021-12-12, 2021-12-13, 2021-12-14, 2021-12-15).
Actually only two partitions of the data match, creating two partitions as expected (2021-12-11, 2021-12-12)
The remaining 3 partitions should not be created (2021-12-13, 2021-12-14, 2021-12-15).
In extreme cases, a very large number of empty partitions are created in the target table
--
This message was sent by Atlassian Jira
(v8.20.10#820010)