You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by "ldwnt (via GitHub)" <gi...@apache.org> on 2023/03/01 05:47:05 UTC

[GitHub] [iceberg] ldwnt commented on issue #6956: Spark: Data file rewriting spark job fails with oom

ldwnt commented on issue #6956:
URL: https://github.com/apache/iceberg/issues/6956#issuecomment-1449386215

   > @ldwnt, if the upstream table is completely refreshed every day, then why use a stream to move the data over to analytic storage? Seems like using a one-time copy after the refresh makes more sense.
   > 
   > I also think that, in general, directly updating an analytic table from Flink is a bad idea. It's usually much more efficient to write the changes directly into a table and periodically compact to materialize the latest table state.
   
   It's possible to handle the completely refreshed tables in the way you metioned. The reason it's not is that I'm ingesting tables from 20 mysql dbs to iceberg and want to archieve the goal using the same set of flink applications.
   
   I change the spark executor memory from 3g to 4g and the rewriting finishes without oom. It seems the cause of the oom is the many delete records collected in memory. Also, the 1.4g used memory of executor displayed in spark UI is not accurate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org