You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/18 08:22:21 UTC

[GitHub] [iceberg] kbendick commented on issue #3885: [OOM] MERGE INTO table with Spark Structured Streaming

kbendick commented on issue #3885:
URL: https://github.com/apache/iceberg/issues/3885#issuecomment-1015171549


   `unpersist` is by default a non-blocking operation. You might consider passing `blocking=true` (I believe that's the argument) to ensure that the dataframe is truly unpersisted when you make that call. This can add time of course, as it's blocking, but will lower the likelihood of OOMs if that's where you think they are coming from.
   
   A small adjustment to see how it's working. But I would go with Russell's answer.
   
   Also, if you update to Spark 3.1, you can use multiple MACHED statements in the same query, which could also majorly reduce your runtime (or at least code complexity).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org