You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/06 00:14:02 UTC

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2276: Spark: Add option to combine tasks by partition

aokolnychyi commented on a change in pull request #2276:
URL: https://github.com/apache/iceberg/pull/2276#discussion_r588793861



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -78,6 +78,9 @@ private TableProperties() {
   public static final String SPLIT_OPEN_FILE_COST = "read.split.open-file-cost";
   public static final long SPLIT_OPEN_FILE_COST_DEFAULT = 4 * 1024 * 1024; // 4MB
 
+  public static final String SPLIT_BY_PARTITION = "read.split.by-partition";

Review comment:
       The problem with read options is that it requires modifying code to change the value. I am also not sure whether having a table property for this is going to help us much. Having this at table level will probably also affect other engines that may not necessarily benefit from bucketed joins.
   
   I am not sure how we can detect whether a table is used in a join or not, though. I don't think Spark propagates that info to sources. Are there any ideas for that?
   
   Overall, I am fine either way. I think we will need a read option, though. It will give us a way to force a particular value. We may want to default it for bucketed tables to true by default.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org