You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/10/26 21:56:00 UTC
[jira] [Assigned] (HUDI-4919) Sql MERGE INTO incurs too much memory overhead
[ https://issues.apache.org/jira/browse/HUDI-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Kudinkin reassigned HUDI-4919:
-------------------------------------
Assignee: Alexey Kudinkin
> Sql MERGE INTO incurs too much memory overhead
> ----------------------------------------------
>
> Key: HUDI-4919
> URL: https://issues.apache.org/jira/browse/HUDI-4919
> Project: Apache Hudi
> Issue Type: Bug
> Components: spark-sql
> Reporter: sivabalan narayanan
> Assignee: Alexey Kudinkin
> Priority: Major
> Fix For: 0.13.0
>
>
> When using spark-sql MERGE INTO, memory requirement shoots up. To merge new incoming data for 120MB parquet file, memory requirement shoots up > 10GB.
>
> from user:
> We are trying to process some input data which is of 5 GB (Parquet snappy compression) and this will try to insert/update Hudi table for 4 days (Day is partition).
> My Data size in Hudi target table for each partition is like around 3.5GB to 10GB.We are trying to process the data and our process is keep failing with OOM (java.lang.OutOfMemoryError: GC overhead limit exceeded).
> We have tried with 32GB and 64GB of executor memory as well with 3 cores.
> Our process is running fine when we have less updates and more inserts.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)