You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Madhan Neethiraj (JIRA)" <ji...@apache.org> on 2016/05/17 00:04:12 UTC

[jira] [Commented] (ATLAS-772) Ordering of columns is not maintained in schema query response, where as hive table entity response maintains the ordering

    [ https://issues.apache.org/jira/browse/ATLAS-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285688#comment-15285688 ] 

Madhan Neethiraj commented on ATLAS-772:
----------------------------------------

[~ayubkhan] I think details like order of columns in a table may not be necessary. If you aware of any usecases that require Atlas to maintain the order of columns in a table, can you please add details?

> Ordering of columns is not maintained in schema query response, where as hive table entity response maintains the ordering
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-772
>                 URL: https://issues.apache.org/jira/browse/ATLAS-772
>             Project: Atlas
>          Issue Type: Bug
>    Affects Versions: 0.7-incubating
>            Reporter: Ayub Khan
>             Fix For: 0.7-incubating
>
>
> Ordering of columns is not maintained in schema query response, where as hive table entity response maintains the ordering
> Table schema
> {noformat}
> 0: jdbc:hive2://localhost:10000/default> describe formatted table_pbrscdldkm;
> +-------------------------------+------------------------------------------------------------------------------+--------------------------------+--+
> |           col_name            |                                  data_type                                   |            comment             |
> +-------------------------------+------------------------------------------------------------------------------+--------------------------------+--+
> | # col_name                    | data_type                                                                    | comment                        |
> |                               | NULL                                                                         | NULL                           |
> | viewtime                      | int                                                                          |                                |
> | userid                        | bigint                                                                       |                                |
> | page_url                      | string                                                                       |                                |
> | referrer_url                  | string                                                                       |                                |
> | ip                            | string                                                                       |                                |
> |                               | NULL                                                                         | NULL                           |
> | # Partition Information       | NULL                                                                         | NULL                           |
> | # col_name                    | data_type                                                                    | comment                        |
> |                               | NULL                                                                         | NULL                           |
> | dt                            | string                                                                       |                                |
> | country                       | string                                                                       | partitioned columns comments.  |
> |                               | NULL                                                                         | NULL                           |
> | # Detailed Table Information  | NULL                                                                         | NULL                           |
> | Database:                     | db2pbrscdldkm                                                                | NULL                           |
> | Owner:                        | apathan                                                                      | NULL                           |
> | CreateTime:                   | Tue May 10 16:36:56 IST 2016                                                 | NULL                           |
> | LastAccessTime:               | UNKNOWN                                                                      | NULL                           |
> | Protect Mode:                 | None                                                                         | NULL                           |
> | Retention:                    | 0                                                                            | NULL                           |
> | Location:                     | hdfs://localhost:9000/user/hive/warehouse/db2pbrscdldkm.db/table_pbrscdldkm  | NULL                           |
> | Table Type:                   | MANAGED_TABLE                                                                | NULL                           |
> | Table Parameters:             | NULL                                                                         | NULL                           |
> |                               | last_modified_by                                                             | apathan                        |
> |                               | last_modified_time                                                           | 1462878417                     |
> |                               | transient_lastDdlTime                                                        | 1462878417                     |
> |                               | NULL                                                                         | NULL                           |
> | # Storage Information         | NULL                                                                         | NULL                           |
> | SerDe Library:                | org.apache.hadoop.hive.serde2.avro.AvroSerDe                                 | NULL                           |
> | InputFormat:                  | org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat                   | NULL                           |
> | OutputFormat:                 | org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat                  | NULL                           |
> | Compressed:                   | No                                                                           | NULL                           |
> | Num Buckets:                  | -1                                                                           | NULL                           |
> | Bucket Columns:               | []                                                                           | NULL                           |
> | Sort Columns:                 | []                                                                           | NULL                           |
> | Storage Desc Params:          | NULL                                                                         | NULL                           |
> |                               | serialization.format                                                         | 1                              |
> +-------------------------------+------------------------------------------------------------------------------+--------------------------------+--+
> 38 rows selected (0.691 seconds)
> {noformat}
> Hive table entity query response which shows ordering is maintained as above
> {noformat}
> curl http://admin:admin@localhost:21000/api/atlas/entities/2d63c256-aee1-47f6-abdc-9db472764585 | python -m json.tool
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                  Dload  Upload   Total   Spent    Left  Speed
> 100  6625    0  6625    0     0  65784      0 --:--:-- --:--:-- --:--:-- 66250
> {
>     "GUID": "2d63c256-aee1-47f6-abdc-9db472764585",
>     "definition": {
>         "id": {
>             "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>             "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>             "state": "ACTIVE",
>             "typeName": "hive_table",
>             "version": 0
>         },
>         "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
>         "traitNames": [],
>         "traits": {},
>         "typeName": "hive_table",
>         "values": {
>             "columns": [
>                 {
>                     "id": {
>                         "id": "f0115d35-c768-476b-917c-3a243085d1ff",
>                         "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                         "state": "ACTIVE",
>                         "typeName": "hive_column",
>                         "version": 0
>                     },
>                     "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
>                     "traitNames": [],
>                     "traits": {},
>                     "typeName": "hive_column",
>                     "values": {
>                         "comment": null,
>                         "name": "viewtime",
>                         "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.viewtime@primary",
>                         "table": {
>                             "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                             "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                             "state": "ACTIVE",
>                             "typeName": "hive_table",
>                             "version": 0
>                         },
>                         "type": "int"
>                     }
>                 },
>                 {
>                     "id": {
>                         "id": "642b6b3a-1e5a-4a06-844e-6fd71ae036b2",
>                         "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                         "state": "ACTIVE",
>                         "typeName": "hive_column",
>                         "version": 0
>                     },
>                     "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
>                     "traitNames": [],
>                     "traits": {},
>                     "typeName": "hive_column",
>                     "values": {
>                         "comment": null,
>                         "name": "userid",
>                         "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.userid@primary",
>                         "table": {
>                             "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                             "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                             "state": "ACTIVE",
>                             "typeName": "hive_table",
>                             "version": 0
>                         },
>                         "type": "bigint"
>                     }
>                 },
>                 {
>                     "id": {
>                         "id": "9b14560e-6471-4a2e-b495-1f08bfad37d3",
>                         "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                         "state": "ACTIVE",
>                         "typeName": "hive_column",
>                         "version": 0
>                     },
>                     "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
>                     "traitNames": [],
>                     "traits": {},
>                     "typeName": "hive_column",
>                     "values": {
>                         "comment": null,
>                         "name": "page_url",
>                         "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.page_url@primary",
>                         "table": {
>                             "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                             "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                             "state": "ACTIVE",
>                             "typeName": "hive_table",
>                             "version": 0
>                         },
>                         "type": "string"
>                     }
>                 },
>                 {
>                     "id": {
>                         "id": "8ca2072f-2b98-4b19-9a17-2e3d125ebbd6",
>                         "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                         "state": "ACTIVE",
>                         "typeName": "hive_column",
>                         "version": 0
>                     },
>                     "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
>                     "traitNames": [],
>                     "traits": {},
>                     "typeName": "hive_column",
>                     "values": {
>                         "comment": null,
>                         "name": "referrer_url",
>                         "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.referrer_url@primary",
>                         "table": {
>                             "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                             "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                             "state": "ACTIVE",
>                             "typeName": "hive_table",
>                             "version": 0
>                         },
>                         "type": "string"
>                     }
>                 },
>                 {
>                     "id": {
>                         "id": "effd4c89-8795-4e54-bd26-f7a182d58c79",
>                         "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                         "state": "ACTIVE",
>                         "typeName": "hive_column",
>                         "version": 0
>                     },
>                     "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
>                     "traitNames": [],
>                     "traits": {},
>                     "typeName": "hive_column",
>                     "values": {
>                         "comment": null,
>                         "name": "ip",
>                         "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.ip@primary",
>                         "table": {
>                             "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                             "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                             "state": "ACTIVE",
>                             "typeName": "hive_table",
>                             "version": 0
>                         },
>                         "type": "string"
>                     }
>                 }
>             ],
>             "comment": null,
>             "createTime": "2016-05-10T11:06:57.000Z",
>             "db": {
>                 "id": "8abe3108-cd0a-42cb-a9f4-54b2256d9ef0",
>                 "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                 "state": "ACTIVE",
>                 "typeName": "hive_db",
>                 "version": 0
>             },
>             "description": null,
>             "lastAccessTime": "2016-05-10T11:06:57.000Z",
>             "name": "db2pbrscdldkm.table_pbrscdldkm@primary",
>             "owner": "apathan",
>             "parameters": {
>                 "last_modified_by": "apathan",
>                 "last_modified_time": "1462878417",
>                 "transient_lastDdlTime": "1462878417"
>             },
>             "partitionKeys": [
>                 {
>                     "id": {
>                         "id": "baa21d89-e899-4f97-8164-7c811cd0b44b",
>                         "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                         "state": "ACTIVE",
>                         "typeName": "hive_column",
>                         "version": 0
>                     },
>                     "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
>                     "traitNames": [],
>                     "traits": {},
>                     "typeName": "hive_column",
>                     "values": {
>                         "comment": null,
>                         "name": "dt",
>                         "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.dt@primary",
>                         "table": {
>                             "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                             "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                             "state": "ACTIVE",
>                             "typeName": "hive_table",
>                             "version": 0
>                         },
>                         "type": "string"
>                     }
>                 },
>                 {
>                     "id": {
>                         "id": "b5e6ce71-21e4-4814-a5da-82b25a71d27c",
>                         "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                         "state": "ACTIVE",
>                         "typeName": "hive_column",
>                         "version": 0
>                     },
>                     "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
>                     "traitNames": [],
>                     "traits": {},
>                     "typeName": "hive_column",
>                     "values": {
>                         "comment": "partitioned columns comments.",
>                         "name": "country",
>                         "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.country@primary",
>                         "table": {
>                             "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                             "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                             "state": "ACTIVE",
>                             "typeName": "hive_table",
>                             "version": 0
>                         },
>                         "type": "string"
>                     }
>                 }
>             ],
>             "retention": 0,
>             "sd": {
>                 "id": {
>                     "id": "6a7ce759-6dfa-4130-bde6-9bdeff64da39",
>                     "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                     "state": "ACTIVE",
>                     "typeName": "hive_storagedesc",
>                     "version": 0
>                 },
>                 "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
>                 "traitNames": [],
>                 "traits": {},
>                 "typeName": "hive_storagedesc",
>                 "values": {
>                     "bucketCols": null,
>                     "compressed": false,
>                     "inputFormat": "org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat",
>                     "location": "hdfs://localhost:9000/user/hive/warehouse/db2pbrscdldkm.db/table_pbrscdldkm",
>                     "numBuckets": -1,
>                     "outputFormat": "org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat",
>                     "parameters": null,
>                     "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm@primary_storage",
>                     "serdeInfo": {
>                         "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
>                         "typeName": "hive_serde",
>                         "values": {
>                             "name": null,
>                             "parameters": {
>                                 "serialization.format": "1"
>                             },
>                             "serializationLib": "org.apache.hadoop.hive.serde2.avro.AvroSerDe"
>                         }
>                     },
>                     "sortCols": null,
>                     "storedAsSubDirectories": false,
>                     "table": {
>                         "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                         "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
>                         "state": "ACTIVE",
>                         "typeName": "hive_table",
>                         "version": 0
>                     }
>                 }
>             },
>             "tableName": "table_pbrscdldkm",
>             "tableType": "MANAGED_TABLE",
>             "temporary": false,
>             "viewExpandedText": null,
>             "viewOriginalText": null
>         }
>     },
>     "requestId": "qtp1576861390-13 - 7211cfb0-ad04-48d9-948a-4e8dbab65e17"
> }
> {noformat}
> Hive schema query
> {noformat}
> curl http://admin:admin@localhost:21000/api/atlas/lineage/hive/table/db2pbrscdldkm.table_pbrscdldkm@primary/schema | python -m json.tool
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                  Dload  Upload   Total   Spent    Left  Speed
> 100  2946    0  2946    0     0   4349      0 --:--:-- --:--:-- --:--:--  4351
> {
>     "requestId": "qtp1576861390-14 - b74c229b-02fc-4115-9f3f-187e949f3966",
>     "results": {
>         "dataType": {
>             "attributeDefinitions": [
>                 {
>                     "dataTypeName": "string",
>                     "isComposite": false,
>                     "isIndexable": true,
>                     "isUnique": false,
>                     "multiplicity": {
>                         "isUnique": false,
>                         "lower": 1,
>                         "upper": 1
>                     },
>                     "name": "name",
>                     "reverseAttributeName": null
>                 },
>                 {
>                     "dataTypeName": "string",
>                     "isComposite": false,
>                     "isIndexable": true,
>                     "isUnique": false,
>                     "multiplicity": {
>                         "isUnique": false,
>                         "lower": 1,
>                         "upper": 1
>                     },
>                     "name": "type",
>                     "reverseAttributeName": null
>                 },
>                 {
>                     "dataTypeName": "string",
>                     "isComposite": false,
>                     "isIndexable": true,
>                     "isUnique": false,
>                     "multiplicity": {
>                         "isUnique": false,
>                         "lower": 0,
>                         "upper": 1
>                     },
>                     "name": "comment",
>                     "reverseAttributeName": null
>                 },
>                 {
>                     "dataTypeName": "hive_table",
>                     "isComposite": false,
>                     "isIndexable": true,
>                     "isUnique": false,
>                     "multiplicity": {
>                         "isUnique": false,
>                         "lower": 0,
>                         "upper": 1
>                     },
>                     "name": "table",
>                     "reverseAttributeName": "columns"
>                 }
>             ],
>             "hierarchicalMetaTypeName": "org.apache.atlas.typesystem.types.ClassType",
>             "superTypes": [
>                 "Referenceable"
>             ],
>             "typeDescription": null,
>             "typeName": "hive_column"
>         },
>         "query": "hive_table where (name = \"db2pbrscdldkm.table_pbrscdldkm@primary\") columns",
>         "rows": [
>             {
>                 "$id$": {
>                     "$typeName$": "hive_column",
>                     "id": "f0115d35-c768-476b-917c-3a243085d1ff",
>                     "state": "ACTIVE",
>                     "version": 0
>                 },
>                 "$typeName$": "hive_column",
>                 "comment": null,
>                 "name": "viewtime",
>                 "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.viewtime@primary",
>                 "table": {
>                     "$typeName$": "hive_table",
>                     "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                     "state": "ACTIVE",
>                     "version": 0
>                 },
>                 "type": "int"
>             },
>             {
>                 "$id$": {
>                     "$typeName$": "hive_column",
>                     "id": "8ca2072f-2b98-4b19-9a17-2e3d125ebbd6",
>                     "state": "ACTIVE",
>                     "version": 0
>                 },
>                 "$typeName$": "hive_column",
>                 "comment": null,
>                 "name": "referrer_url",
>                 "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.referrer_url@primary",
>                 "table": {
>                     "$typeName$": "hive_table",
>                     "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                     "state": "ACTIVE",
>                     "version": 0
>                 },
>                 "type": "string"
>             },
>             {
>                 "$id$": {
>                     "$typeName$": "hive_column",
>                     "id": "9b14560e-6471-4a2e-b495-1f08bfad37d3",
>                     "state": "ACTIVE",
>                     "version": 0
>                 },
>                 "$typeName$": "hive_column",
>                 "comment": null,
>                 "name": "page_url",
>                 "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.page_url@primary",
>                 "table": {
>                     "$typeName$": "hive_table",
>                     "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                     "state": "ACTIVE",
>                     "version": 0
>                 },
>                 "type": "string"
>             },
>             {
>                 "$id$": {
>                     "$typeName$": "hive_column",
>                     "id": "effd4c89-8795-4e54-bd26-f7a182d58c79",
>                     "state": "ACTIVE",
>                     "version": 0
>                 },
>                 "$typeName$": "hive_column",
>                 "comment": null,
>                 "name": "ip",
>                 "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.ip@primary",
>                 "table": {
>                     "$typeName$": "hive_table",
>                     "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                     "state": "ACTIVE",
>                     "version": 0
>                 },
>                 "type": "string"
>             },
>             {
>                 "$id$": {
>                     "$typeName$": "hive_column",
>                     "id": "642b6b3a-1e5a-4a06-844e-6fd71ae036b2",
>                     "state": "ACTIVE",
>                     "version": 0
>                 },
>                 "$typeName$": "hive_column",
>                 "comment": null,
>                 "name": "userid",
>                 "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.userid@primary",
>                 "table": {
>                     "$typeName$": "hive_table",
>                     "id": "2d63c256-aee1-47f6-abdc-9db472764585",
>                     "state": "ACTIVE",
>                     "version": 0
>                 },
>                 "type": "bigint"
>             }
>         ]
>     },
>     "tableName": "db2pbrscdldkm.table_pbrscdldkm@primary"
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)