You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by GitBox <gi...@apache.org> on 2018/12/05 22:11:59 UTC

[GitHub] mccheah opened a new issue #23: File identifiers and aliases

mccheah opened a new issue #23: File identifiers and aliases
URL: https://github.com/apache/incubator-iceberg/issues/23
 
 
   A URI locating a file may not be enough for file I/O implementations to construct `InputFile` and `OutputFile` instances, as proposed in https://github.com/apache/incubator-iceberg/issues/12. More specifically, consider a system where a file has some path, but that same path can be namespaced in different contexts. For example, the metadata for that same file can evolve over time, as we discussed in https://github.com/apache/incubator-iceberg/issues/16.
   
   We propose adding another field called an `ExternalIdentifier` to the `DataFile` schema, which is an optional String tag allowing custom Iceberg consumers to look up the file in their system using their own unique identification system. This would allow such systems to look up the file directly by the identifier in addition to the path.
   
   Alternative representations for the `ExternalIdentifier` that would allow for richer representations could be a byte blob or a `struct` with some schema that's stored in the table properties. However those representations can encourage more arbitrary and uncontrolled use of the field which we probably want to avoid. `String` seems to be the safest option.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services