You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by GitBox <gi...@apache.org> on 2018/12/07 18:08:19 UTC

[GitHub] rdblue opened a new issue #36: Split files when planning scan tasks

rdblue opened a new issue #36: Split files when planning scan tasks
URL: https://github.com/apache/incubator-iceberg/issues/36
 
 
   When building a scan, the TableScan API can plan the files to read (`planFiles`) or group the files into combined splits (`planTasks`). Split planning should also split files at the target split size before bin packing to create the final splits.
   
   This relates to adding split locations to the manifest file (row group or stripe offsets). The simple version of this issue is to split at the target split size and then combine, but eventually we want to take the split offsets into account if it does make sense to store them in the manifest file.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services