You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2023/03/30 02:13:00 UTC
[jira] [Updated] (HUDI-5516) Reduce memory footprint on workload with thousand active partitions
[ https://issues.apache.org/jira/browse/HUDI-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan updated HUDI-5516:
--------------------------------------
Fix Version/s: 0.12.3
> Reduce memory footprint on workload with thousand active partitions
> -------------------------------------------------------------------
>
> Key: HUDI-5516
> URL: https://issues.apache.org/jira/browse/HUDI-5516
> Project: Apache Hudi
> Issue Type: Improvement
> Components: flink
> Reporter: Alexander Trushev
> Assignee: Danny Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.13.0, 0.12.3
>
>
> We can reduce memory footprint on workload with thousand active partitions between checkpoints. That workload is relevant with wide checkpoint interval. More specifically, active partition here is a special case of active fileId.
> Write client holds map with write handles to create ReplaceHandle between checkpoints. It leads to OutOfMemoryError on the workload because write handle is huge object.
> {code:sql}
> create table source (
> `id` int,
> `data` string
> ) with (
> 'connector' = 'datagen',
> 'rows-per-second' = '100',
> 'fields.id.kind' = 'sequence',
> 'fields.id.start' = '0',
> 'fields.id.end' = '3000'
> );
> create table sink (
> `id` int primary key,
> `data` string,
> `part` string
> ) partitioned by (`part`) with (
> 'connector' = 'hudi',
> 'path' = '/tmp/sink',
> 'write.batch.size' = '0.001', -- 1024 bytes
> 'write.task.max.size' = '101.001', -- 101.001MB
> 'write.merge.max_memory' = '1' -- 1024 bytes
> );
> insert into sink select `id`, `data`, concat('part', cast(`id` as string)) as `part` from source;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)