You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/08 02:01:36 UTC

[GitHub] [iceberg] xloya commented on pull request #3977: [Core][Spark][Flink] Change partitioned fanout/delta writers map to caffine cache

xloya commented on pull request #3977:
URL: https://github.com/apache/iceberg/pull/3977#issuecomment-1032133971


   
   > @aokolnychyi, do you think this is a good idea?
   > 
   > I'm not sure about this. What will end up happening in Spark is that you'll create a lot of new data files. But in that case you should have used a better plan that clustered data instead of using the fanout writer. Maybe this is needed in Flink only?
   
   In fact this happens when user writes to all partitions using Spark `insert overwrite`. We have tried several ways, unless we use `distribute by` in SQL to break up the data, or the oom will still appear


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org