You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/01 06:47:23 UTC

[GitHub] [iceberg] jackye1995 commented on pull request #3048: AWS: copy Iceberg schema to Glue

jackye1995 commented on pull request #3048:
URL: https://github.com/apache/iceberg/pull/3048#issuecomment-909964634


   @rdblue thanks for the quick review! 
   
   After experimenting the whole night for a few different ways, I think it is likely better to
   1. directly map Iceberg partition fields to Glue partition keys, in that sense the iceberg partition fields are basically "virtual columns" in Glue
   2. add the full Iceberg field string as the comment of each column, so that users can see additional information like optional/required, nested field column ID, etc.
   
   Here is an example result stored in Glue:
   
   ```json
   {
       "Table": {
           "Name": "test",
           "DatabaseName": "jack",
           "CreateTime": "2021-08-31T22:56:30-07:00",
           "UpdateTime": "2021-08-31T22:56:30-07:00",
           "Retention": 0,
           "StorageDescriptor": {
               "Columns": [
                   {
                       "Name": "i",
                       "Type": "int",
                       "Comment": "Iceberg column: { 1: i: required int }"
                   },
                   {
                       "Name": "l",
                       "Type": "bigint",
                       "Comment": "Iceberg column: { 2: l: required long }"
                   },
                   {
                       "Name": "d",
                       "Type": "date",
                       "Comment": "Iceberg column: { 3: d: required date }"
                   },
                   {
                       "Name": "t",
                       "Type": "string",
                       "Comment": "Iceberg column: { 4: t: required time }"
                   },
                   {
                       "Name": "ts",
                       "Type": "timestamp",
                       "Comment": "Iceberg column: { 5: ts: required timestamp }"
                   },
                   {
                       "Name": "tstz",
                       "Type": "timestamp",
                       "Comment": "Iceberg column: { 6: tstz: required timestamptz }"
                   },
                   {
                       "Name": "dec",
                       "Type": "decimal(9,2)",
                       "Comment": "Iceberg column: { 7: dec: required decimal(9, 2) }"
                   },
                   {
                       "Name": "s",
                       "Type": "string",
                       "Comment": "Iceberg column: { 8: s: required string }"
                   },
                   {
                       "Name": "u",
                       "Type": "string",
                       "Comment": "Iceberg column: { 9: u: required uuid }"
                   },
                   {
                       "Name": "f",
                       "Type": "binary",
                       "Comment": "Iceberg column: { 10: f: required fixed[3] }"
                   },
                   {
                       "Name": "b",
                       "Type": "binary",
                       "Comment": "Iceberg column: { 11: b: required binary }"
                   },
                   {
                       "Name": "struct",
                       "Type": "struct<i2:int,l2:bigint,d2:date>",
                       "Comment": "Iceberg column: { 12: struct: required struct<15: i2: required int, 16: l2: required long, 17: d2: required date> }"
                   },
                   {
                       "Name": "list",
                       "Type": "array<struct<i3:int,l3:bigint,d3:date>>",
                       "Comment": "Iceberg column: { 13: list: required list<struct<19: i3: required int, 20: l3: required long, 21: d3: required date>> }"
                   },
                   {
                       "Name": "map",
                       "Type": "map<string,struct<i4:int,l5:bigint,d6:date>>",
                       "Comment": "Iceberg column: { 14: map: required map<string, struct<24: i4: required int, 25: l5: required long, 26: d6: required date>> }"
                   }
               ],
               "Location": "s3://bucket/path",
               "Compressed": false,
               "NumberOfBuckets": 0,
               "SortColumns": [],
               "StoredAsSubDirectories": false
           },
           "PartitionKeys": [
               {
                   "Name": "s_bucket",
                   "Type": "int",
                   "Comment": "Iceberg partition field: {1000: s_bucket: bucket[16](8)}"
               },
               {
                   "Name": "map.i4_trunc",
                   "Type": "int",
                   "Comment": "Iceberg partition field: {1001: map.i4_trunc: truncate[2](24)}"
               },
               {
                   "Name": "ts_day",
                   "Type": "date",
                   "Comment": "Iceberg partition field: {1002: ts_day: day(5)}"
               },
               {
                   "Name": "i",
                   "Type": "int",
                   "Comment": "Iceberg partition field: {1003: i: identity(1)}"
               },
               {
                   "Name": "struct.i2",
                   "Type": "int",
                   "Comment": "Iceberg partition field: {1004: struct.i2: identity(15)}"
               }
           ],
           "TableType": "EXTERNAL_TABLE",
           "Parameters": {
               "metadata_location": "s3://bucket/path",
               "table_type": "ICEBERG"
           }
       }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org