You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/09/26 15:26:00 UTC

[jira] [Created] (HUDI-4919) Sql MERGE INTO incurs too much memory overhead

sivabalan narayanan created HUDI-4919:
-----------------------------------------

             Summary: Sql MERGE INTO incurs too much memory overhead
                 Key: HUDI-4919
                 URL: https://issues.apache.org/jira/browse/HUDI-4919
             Project: Apache Hudi
          Issue Type: Bug
          Components: spark-sql
            Reporter: sivabalan narayanan


When using spark-sql MERGE INTO, memory requirement shoots up. To merge new incoming data for 120MB parquet file, memory requirement shoots up > 10GB. 

 

from user:

We are trying to process some input data which is of 5 GB (Parquet snappy compression) and this will try to insert/update Hudi table for 4 days (Day is partition).
My Data size in Hudi target table for each partition is like around 3.5GB to 10GB.We are trying to process the data and our process is keep failing with OOM (java.lang.OutOfMemoryError: GC overhead limit exceeded).
We have tried with 32GB and 64GB of executor memory as well with 3 cores.
Our process is running fine when we have less updates and more inserts.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)