You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/18 21:03:05 UTC

[GitHub] [iceberg] sunchao commented on pull request #2276: Core: Add option to combine tasks by partition

sunchao commented on PR #2276:
URL: https://github.com/apache/iceberg/pull/2276#issuecomment-1283000536

   Thanks @aokolnychyi ! let me fix the API compatibility check too.
   
   > I think it is reasonable to not combine files across partitions for partitioned tables by default in Spark, hoping we can benefit from storage-partitioned joins. However, I worry the new behavior may cause performance regressions in some cases as we will generate more splits (even though we may not benefit from any join optimizations). Do we want to expose a way to force combining files across partitions (i.e. old behavior)? There are two ways to support that: either add a read option in Iceberg or try checking if storage-partitioned joins are enabled in Spark SQL (if not, we can safely combine). Since Spark will pass join attributes in the future, adding a read option does not seem preferable. Any thoughts?
   
   As discussed offline, this adds a Spark SQL conf: `spark.sql.iceberg.splits-by-partition`, to dictate whether we should combine splits across partition boundaries in Iceberg. There's work planned on Spark side to add APIs and pass the info to Iceberg, which is a better solution and will eventually supersede this approach.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org