You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hudi.apache.org by vtygoss <vt...@126.com> on 2021/07/12 11:38:15 UTC

hudi hard deletion in flink sql AND detect deletions of upstream

Hi, 


I have two problems:


1. How to specify hard deletion in hudi-flink-bundle-0.9.0? 
2. How to detect the deletion events in downstream hudi-flink sql streaming?  The down streams need to detect the deletions of input hudi table and act accordingly.




I tried to use org.apache.hudi.common.model. EmptyHoodieRecordPayload, but it seems like that EmptyHoodieRecordPayload is not really deletion but emits null value of none primary key? i am not sure.  BTW, klass EmptyHoodieRecordPayload is lack of a constructor of parameter klass “org.apache.hudi.common.util.Option". 


please offer some advices, thank you very much!


Best Regards!




```
CREATE TABLE t3(
  uuid VARCHAR(20), 
  name VARCHAR(10),
  age INT,
  ts TIMESTAMP(3),
  `partition` VARCHAR(20),
  primary key(uuid) not enforced
)
PARTITIONED BY (`partition`)
WITH (
  'connector' = 'hudi',
  'path' = 'hdfs://bruneihealth/user/data/db/hudi_flink/t3',
  'table.type' = 'MERGE_ON_READ',
  'read.tasks' = '1',
  'read.streaming.enabled' = 'true',  
  'read.streaming.check-interval' = '1', 
  'hoodie.datasource.write.partitionpath.field'='_hoodie_partition_path',
  'write.payload.class' = 'org.apache.hudi.common.model.EmptyHoodieRecordPayload',
  'compaction.async.enabled'='false'
);
```

Re: hudi hard deletion in flink sql AND detect deletions of upstream

Posted by Danny Chan <da...@apache.org>.
Hi vtygoss ~

By default, when consuming cdc stream DELETEs, the flink writer would
nullify the payload instant so that the write handle would recognize these
DELETEs and do HARD delete: do not write anything in the file.

If you want to detect DELETEs downstream, you may need to wait for the
HUDI-1771, which would keep DELETEs with proper change flags.

Best,
Danny Chan

vtygoss <vt...@126.com> 于2021年7月12日周一 下午7:38写道:

> Hi,
>
>
> I have two problems:
>
>
> 1. How to specify hard deletion in hudi-flink-bundle-0.9.0?
>
> 2. How to detect the deletion events in downstream hudi-flink sql
> streaming?  The down streams need to detect the deletions of input hudi
> table and act accordingly.
>
>
>
> I tried to use org.apache.hudi.common.model. EmptyHoodieRecordPayload, but
> it seems like that EmptyHoodieRecordPayload is not really deletion but
> emits null value of none primary key? i am not sure.  BTW, klass
> EmptyHoodieRecordPayload is lack of a constructor of parameter klass “
> org.apache.hudi.common.util.Option".
>
>
> please offer some advices, thank you very much!
>
>
> Best Regards!
>
>
>
> ```
>
> CREATE TABLE t3(
>
>   uuid VARCHAR(20),
>
>   name VARCHAR(10),
>
>   age INT,
>
>   ts TIMESTAMP(3),
>
>   `partition` VARCHAR(20),
>
>   primary key(uuid) not enforced
>
> )
>
> PARTITIONED BY (`partition`)
>
> WITH (
>
>   'connector' = 'hudi',
>
>   'path' = 'hdfs://bruneihealth/user/data/db/hudi_flink/t3',
>
>   'table.type' = 'MERGE_ON_READ',
>
>   'read.tasks' = '1',
>
>   'read.streaming.enabled' = 'true',
>
>   'read.streaming.check-interval' = '1',
>
>   'hoodie.datasource.write.partitionpath.field'='_hoodie_partition_path',
>
>   'write.payload.class' =
> 'org.apache.hudi.common.model.EmptyHoodieRecordPayload',
>
>   'compaction.async.enabled'='false'
>
> );
>
> ```
>