You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/27 00:43:12 UTC

[GitHub] [iceberg] rdblue commented on pull request #2354: Core: add row key to format v2

rdblue commented on pull request #2354:
URL: https://github.com/apache/iceberg/pull/2354#issuecomment-808606039


   @jackye1995 and @openinx, I have a few questions about this before I'm comfortable merging it. Thanks for working on this so far!
   
   Why do we need to track multiple versions of the row identifier like we do for schema, partition spec, and sort order? I think of this as the "fields that identify a row". Is it helpful to have more than one view of how rows are identified?
   
   To answer that, we need to consider whether two versions are ever valid at the same time, and how row IDs are going to evolve over time:
   * Row identifier columns may be set, either to initialize or to fix a mistake (e.g., used account_id instead of profile_id)
   * Row identifier columns may be added, when a new identifying column is added to the schema (e.g., adding profile_id to a table previously identified by only account_id)
   
   I think both of those operations only require setting the current way of identifying rows, not keeping track of the previous ways. I'm interested to hear what everyone thinks about that and whether there is agreement.
   
   If I'm correct, then I would probably not keep track of multiple versions here. If I'm not, then I think we should ask whether the row ID columns should be tracked in the schema itself rather than separately versioned, since they will probably change at the same time the schema does -- when adding a new column that is now part of the identifier.
   
   It would be great to hear from @aokolnychyi on this as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org