You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by GitBox <gi...@apache.org> on 2018/12/08 00:29:36 UTC

[GitHub] rdblue opened a new issue #41: Add an API to maintain external schema mappings

rdblue opened a new issue #41: Add an API to maintain external schema mappings
URL: https://github.com/apache/incubator-iceberg/issues/41
 
 
   Once Iceberg supports external schema mappings (#40), it should also support an easy way to maintain those mappings by notifying Iceberg when an external schema changes. Iceberg would update its mapping when notified.
   
   For example, starting with this mapping:
   
   ```json
   [ {"field-id": 1, "names": ["id"]},
     {"field-id": 2, "names": ["data"]} ]
   ```
   
   Consider a new Avro schema registered that changes the name `id` to `obj_id` and adds a `ts` field. Iceberg would add an un-mapped entry for `ts` and add `obj_id` to the `id` mapping based on the Avro schema's field alias that indicates `id` and `obj_id` are the same field. The updated mapping would be:
   
   ```json
   [ {"field-id": 1, "names": ["obj_id", "id"]},
     {"field-id": 2, "names": ["data"]},
     {"names": ["ts"]} ]
   ```
   
   Next, if the Iceberg table schema is updated to add `ts`, the mapping would be updated by matching the new Iceberg column to the unmatched mapping entry to produce this mapping:
   
   ```json
   [ {"field-id": 1, "names": ["obj_id", "id"]},
     {"field-id": 2, "names": ["data"]},
     {"field-id": 3, "names": ["ts"]} ]
   ```
   
   This would maintain compatibility with new Avro data files without making changes to the Iceberg table other than the mapping. Columns can be added in Iceberg or Avro first and the mapping is completed by column name when it is added in both schemas.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services