You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by GitBox <gi...@apache.org> on 2018/11/27 02:16:35 UTC

[GitHub] mccheah opened a new issue #16: Custom metadata in data files

mccheah opened a new issue #16: Custom metadata in data files
URL: https://github.com/apache/incubator-iceberg/issues/16
 
 
   (Migrated from https://github.com/Netflix/iceberg/issues/106 with some extra details added)
   
   It would be useful for consumers of Iceberg tables to be able to specify additional metadata in data files that enable them to know how to read the files. Some examples of custom metadata include:
   
   * Encryption keys required to read the file,
   * Compression codecs specified on the file without needing to have a specific file extension,
   * Metadata that's specific to a custom file format. Suppose we supported CSV tables in Iceberg down the road. It would be nice to attach the column delimiter on a per-file basis so that a table can be comprised of multiple files that may not necessarily be uniform in terms of the exact layout, but have compatible schemas.
   
   The custom metadata field should be of type `Map<String, String>` and can be an optional column.
   
   Finally, consider the I/O submodule proposed in https://github.com/apache/incubator-iceberg/issues/12. In `FileIO` there, the `newOutputFile` API should also return custom metadata specific to reading that file after it's written. Thus `FileIO#newOutputFile` should return a struct containing an `OutputFile` object for writing the bytes and a `Map<String, String>` collection of metadata to be saved in the manifest after the data file is written.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services