You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/20 23:09:31 UTC

[GitHub] [iceberg] electrum commented on a change in pull request #2064: Flink: Support write.distribution-mode.

electrum commented on a change in pull request #2064:
URL: https://github.com/apache/iceberg/pull/2064#discussion_r561367698



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -138,6 +138,9 @@ private TableProperties() {
   public static final String ENGINE_HIVE_ENABLED = "engine.hive.enabled";
   public static final boolean ENGINE_HIVE_ENABLED_DEFAULT = false;
 
+  public static final String WRITE_SHUFFLE_BY_PARTITION = "write.shuffle-by.partition";

Review comment:
       @rdblue thanks for the pointer. Here are my thoughts on how this would work for Trino (formerly Presto SQL).
   
   Trino does streaming execution between stages -- there is no materialized shuffle phase. This means that global sorting would only be possible using a fixed range, not based on statistics, so it would be vulnerable to skew. I'd like to understand the use case for global "sort" compared to "partition".
   
   For local sorting, I see two choices:
   
   1. Write arbitrarily large files. Use a fixed size in-memory buffer, sort when full, write to temporary file, then merge files at end. There may be multiple merge passes in order to limit the number of files read at once during the merge. This is what we do for Hive bucketed-sorted tables, since sorting per bucket is required.
   2. Write multiple size-limited files. Use a fixed size in-memory buffer, sort when full, write final output file. Repeat until all input data for writer has been consumed.
   
   I would prefer the second option as it is simpler and uses fewer resources. It satisfies the property that each file is sorted and helps with compression and within-file filtering. The downside is that there are more files, but if they are of sufficient size, it shouldn't affect reads as we split files anyway when reading.
   
   Another option is to sort data using a fixed size buffer before writing each batch of rows. This would help with compression and within-file filtering, but wouldn't provide a guarantee on sorting for readers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org