You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by GitBox <gi...@apache.org> on 2018/12/08 00:25:54 UTC

[GitHub] rdblue opened a new issue #40: Add external schema mappings for files written with name-based schemas

rdblue opened a new issue #40: Add external schema mappings for files written with name-based schemas
URL: https://github.com/apache/incubator-iceberg/issues/40
 
 
   Files written by Iceberg writers contain Iceberg field IDs that are used for column projection. Iceberg doesn't currently support tracking data files that were written by other systems and added to Iceberg tables with the API because the field IDs are missing. To support files written by non-Iceberg writers, Iceberg could support a table-level mapping from a source schema to Iceberg IDs.
   
   For example, a table with 2 columns might have an Avro schema mapping like this one, encoded as JSON in table properties:
   
   ```json
   [ {"field-id": 1, "names": ["id"]},
     {"field-id": 2, "names": ["data"]} ]
   ```
   
   When reading an Avro file, the [read schema](https://github.com/Netflix/iceberg/blob/master/core/src/main/java/com/netflix/iceberg/avro/BuildAvroProjection.java#L50) would be produced using the file's schema and the field IDs from the mapping. The `names` in each field mapping is a list to handle aliasing.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services