You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2023/03/30 02:13:00 UTC

[jira] [Updated] (HUDI-5516) Reduce memory footprint on workload with thousand active partitions

     [ https://issues.apache.org/jira/browse/HUDI-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-5516:
--------------------------------------
    Fix Version/s: 0.12.3

> Reduce memory footprint on workload with thousand active partitions
> -------------------------------------------------------------------
>
>                 Key: HUDI-5516
>                 URL: https://issues.apache.org/jira/browse/HUDI-5516
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: flink
>            Reporter: Alexander Trushev
>            Assignee: Danny Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.0, 0.12.3
>
>
> We can reduce memory footprint on workload with thousand active partitions between checkpoints. That workload is relevant with wide checkpoint interval. More specifically, active partition here is a special case of active fileId.
> Write client holds map with write handles to create ReplaceHandle between checkpoints. It leads to OutOfMemoryError on the workload because write handle is huge object.
> {code:sql}
> create table source (
>     `id` int,
>     `data` string
> ) with (
>     'connector' = 'datagen',
>     'rows-per-second' = '100',
>     'fields.id.kind' = 'sequence',
>     'fields.id.start' = '0',
>     'fields.id.end' = '3000'
> );
> create table sink (
>     `id` int primary key,
>     `data` string,
>     `part` string
> ) partitioned by (`part`) with (
>     'connector' = 'hudi',
>     'path' = '/tmp/sink',
>     'write.batch.size' = '0.001',  -- 1024 bytes
>     'write.task.max.size' = '101.001',  -- 101.001MB
>     'write.merge.max_memory' = '1'  -- 1024 bytes
> );
> insert into sink select `id`, `data`, concat('part', cast(`id` as string)) as `part` from source;
> {code} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)