You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/02/07 21:43:03 UTC

[GitHub] [iceberg] jackye1995 commented on pull request #2096: Core: add schema id and schemas to table metadata

jackye1995 commented on pull request #2096:
URL: https://github.com/apache/iceberg/pull/2096#issuecomment-774773226


   Works fine for me so far, tested a few operations among a few clusters of different versions. 
   
   One behavior I see is that, if an old v1 writer is somehow still used to create a new table version after v2 writer is used, all schema history will be dropped, so new schema written by v2 writer will have schema id starting from 0 again, which can break time travel query. 
   
   In theory, once a table is written by a v2 writer, it should never be touched by a v1 writer, but we cannot prevent people from doing that in real life, especially when the organization is large and people use different versions of the same software all the time. So I feel we still need something like a `last-assigned-schema-id`, so that the schemas don't accidentally share the same id and we can backfill the earlier schema and their IDs if necessary. But `last-assigned-schema-id` field should not as a part of the table metadata field because it will be dropped by an old writer. One way to go is to maybe put it in table properties, but there might be better ways. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org