You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/31 14:55:10 UTC

[GitHub] [iceberg] RussellSpitzer opened a new issue #1281: Iceberg Datasource Writer Should Automatically Prune Identity Transform Partition Columns

RussellSpitzer opened a new issue #1281:
URL: https://github.com/apache/iceberg/issues/1281


   While writing the vectorized reader for Identity transforms in parquet I ran into the detail that when you write to a Parquet backed Iceberg table it will write columns for all data present even if those rows are already defined by the partitioning. I think it probably makes sense to strip out these columns on write to save space and time when writing. Any thoughts?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] ericsun2 commented on issue #1281: Iceberg Datasource Writer Should Automatically Prune Identity Transform Partition Columns

Posted by GitBox <gi...@apache.org>.

ericsun2 commented on issue #1281:
URL: https://github.com/apache/iceberg/issues/1281#issuecomment-699511832


   So you are suggesting the Hive-style Virtual Partition Column which only stored in metadata but no the data files?
   Then that is probably against the original design of Iceberg.
   The space waste for such repetitive values of `Identity()` Transform in Parquet/ORC should be close to nothing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] ericsun2 edited a comment on issue #1281: Iceberg Datasource Writer Should Automatically Prune Identity Transform Partition Columns

Posted by GitBox <gi...@apache.org>.

ericsun2 edited a comment on issue #1281:
URL: https://github.com/apache/iceberg/issues/1281#issuecomment-699511832


   So are you suggesting the Hive-style Virtual Partition Column which is only stored in metadata but not in the data files?
   Then that is probably against the original design of Iceberg.
   The space waste for such repetitive values of `Identity()` Transform in Parquet/ORC should be close to nothing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org