You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/30 07:28:09 UTC

[GitHub] [iceberg] rdblue commented on issue #1257: Support vectorized reads with identity transforms

rdblue commented on issue #1257:
URL: https://github.com/apache/iceberg/issues/1257#issuecomment-665932497


   I think @shardulm94 is right. Iceberg will write all data columns into every file, unlike Hive that will leave partition columns out of the data. The reason why we write the columns is that we may want to move the files to a different partition spec later (e.g., drop a categorical column, move the files, then compact).
   
   So this is probably working because the data files in the tests actually do contain the data columns. What you can do to make the tests fail is generate the data files without the identity-partitioned columns, add them to a table, then validate that you still get those columns when you read. That's basically what happens when we import Hive data into Iceberg tables because the columns aren't present in the data files and we use the values from the file's partition tuple.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org