You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/08 01:03:38 UTC

[GitHub] [iceberg] rdblue commented on pull request #6369: Increase Partition Start Id to 10000

rdblue commented on PR #6369:
URL: https://github.com/apache/iceberg/pull/6369#issuecomment-1341825580

   Looks like @RussellSpitzer, @szehon-ho, and @aokolnychyi are looking at this and have noted the issues with v1 tables.
   
   I think that this is risky because not all v1 readers will use partition field IDs, but we do write them into partition specs now. Currently, we are careful that those IDs are always the same, but this change would cause them to differ. It may be safe, but I'd test very thoroughly and possibly put this behind a flag.
   
   I'd also like to understand why this is needed. Partition field IDs are stored in manifest files, not data files. Partition field IDs should generally not mix with data field IDs from the Iceberg schema.
   
   The only case I can think of right now is projecting the `_partition` metadata field when reading a table... but in that case I think there needs to be a better solution. Running into a collision at 10,000 fields is still possible with this PR. We should just assign new field IDs to the `_partition` metadata fields.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org