You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/20 02:56:44 UTC

[GitHub] [iceberg] openinx commented on a change in pull request #2064: Flink: Support write.distribution-mode.

openinx commented on a change in pull request #2064:
URL: https://github.com/apache/iceberg/pull/2064#discussion_r560641150



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -138,6 +138,9 @@ private TableProperties() {
   public static final String ENGINE_HIVE_ENABLED = "engine.hive.enabled";
   public static final boolean ENGINE_HIVE_ENABLED_DEFAULT = false;
 
+  public static final String WRITE_SHUFFLE_BY_PARTITION = "write.shuffle-by.partition";

Review comment:
       > Flink may eventually provide a way to order within data files, but I think that is less important than clustering data across files so that data files can be skipped in queries.
   
   Agreed. Though sorting within data file would be really helpful for page skipping,  but that would introduce more cost for streaming processing job.  Range distribution by sorted keys is some kind of coarse granularity,  but it's good enough for streaming job to cluster keys for filtering among data files,  I think it's a better balanced choice when trade off between write efficiency and read performances. 
   
   It make sense to me that rewriting those range distributed data files into row-ordering files if there're heavy reads that depends on them. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org