You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/09/26 15:26:00 UTC
[jira] [Created] (HUDI-4919) Sql MERGE INTO incurs too much memory overhead
sivabalan narayanan created HUDI-4919:
-----------------------------------------
Summary: Sql MERGE INTO incurs too much memory overhead
Key: HUDI-4919
URL: https://issues.apache.org/jira/browse/HUDI-4919
Project: Apache Hudi
Issue Type: Bug
Components: spark-sql
Reporter: sivabalan narayanan
When using spark-sql MERGE INTO, memory requirement shoots up. To merge new incoming data for 120MB parquet file, memory requirement shoots up > 10GB.
from user:
We are trying to process some input data which is of 5 GB (Parquet snappy compression) and this will try to insert/update Hudi table for 4 days (Day is partition).
My Data size in Hudi target table for each partition is like around 3.5GB to 10GB.We are trying to process the data and our process is keep failing with OOM (java.lang.OutOfMemoryError: GC overhead limit exceeded).
We have tried with 32GB and 64GB of executor memory as well with 3 cores.
Our process is running fine when we have less updates and more inserts.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)