You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/09 17:17:04 UTC

[GitHub] [iceberg] szehon-ho commented on pull request #4898: Core: Add schema_id to ContentFile/ManifestFile

szehon-ho commented on PR #4898:
URL: https://github.com/apache/iceberg/pull/4898#issuecomment-1209652918

   @ConeyLiu that's a good question, I think (may be wrong) rewriteDataFiles groups files by partition/partition spec, and may not preserve the old schemas.  Ie, all the data files are rewritten with latest schema of that partition spec.  
   
   I think the situation would be the same even in your proposal to add new schemaid field to data_file, right?  After rewriteDataFiles we have to carry over the latest schema-id of each spec , in order for your initial proposed optimization to be accurate?  Because there may be data in the new file that was written by a later schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org