You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Ayub Khan (JIRA)" <ji...@apache.org> on 2016/05/11 06:58:12 UTC

[jira] [Created] (ATLAS-772) Ordering of columns is not maintained in schema query response, where as hive table entity response maintains the ordering

Ayub Khan created ATLAS-772:
-------------------------------

             Summary: Ordering of columns is not maintained in schema query response, where as hive table entity response maintains the ordering
                 Key: ATLAS-772
                 URL: https://issues.apache.org/jira/browse/ATLAS-772
             Project: Atlas
          Issue Type: Bug
    Affects Versions: 0.7-incubating
            Reporter: Ayub Khan
             Fix For: 0.7-incubating


Ordering of columns is not maintained in schema query response, where as hive table entity response maintains the ordering

Table schema
{noformat}
0: jdbc:hive2://localhost:10000/default> describe formatted table_pbrscdldkm;
+-------------------------------+------------------------------------------------------------------------------+--------------------------------+--+
|           col_name            |                                  data_type                                   |            comment             |
+-------------------------------+------------------------------------------------------------------------------+--------------------------------+--+
| # col_name                    | data_type                                                                    | comment                        |
|                               | NULL                                                                         | NULL                           |
| viewtime                      | int                                                                          |                                |
| userid                        | bigint                                                                       |                                |
| page_url                      | string                                                                       |                                |
| referrer_url                  | string                                                                       |                                |
| ip                            | string                                                                       |                                |
|                               | NULL                                                                         | NULL                           |
| # Partition Information       | NULL                                                                         | NULL                           |
| # col_name                    | data_type                                                                    | comment                        |
|                               | NULL                                                                         | NULL                           |
| dt                            | string                                                                       |                                |
| country                       | string                                                                       | partitioned columns comments.  |
|                               | NULL                                                                         | NULL                           |
| # Detailed Table Information  | NULL                                                                         | NULL                           |
| Database:                     | db2pbrscdldkm                                                                | NULL                           |
| Owner:                        | apathan                                                                      | NULL                           |
| CreateTime:                   | Tue May 10 16:36:56 IST 2016                                                 | NULL                           |
| LastAccessTime:               | UNKNOWN                                                                      | NULL                           |
| Protect Mode:                 | None                                                                         | NULL                           |
| Retention:                    | 0                                                                            | NULL                           |
| Location:                     | hdfs://localhost:9000/user/hive/warehouse/db2pbrscdldkm.db/table_pbrscdldkm  | NULL                           |
| Table Type:                   | MANAGED_TABLE                                                                | NULL                           |
| Table Parameters:             | NULL                                                                         | NULL                           |
|                               | last_modified_by                                                             | apathan                        |
|                               | last_modified_time                                                           | 1462878417                     |
|                               | transient_lastDdlTime                                                        | 1462878417                     |
|                               | NULL                                                                         | NULL                           |
| # Storage Information         | NULL                                                                         | NULL                           |
| SerDe Library:                | org.apache.hadoop.hive.serde2.avro.AvroSerDe                                 | NULL                           |
| InputFormat:                  | org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat                   | NULL                           |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat                  | NULL                           |
| Compressed:                   | No                                                                           | NULL                           |
| Num Buckets:                  | -1                                                                           | NULL                           |
| Bucket Columns:               | []                                                                           | NULL                           |
| Sort Columns:                 | []                                                                           | NULL                           |
| Storage Desc Params:          | NULL                                                                         | NULL                           |
|                               | serialization.format                                                         | 1                              |
+-------------------------------+------------------------------------------------------------------------------+--------------------------------+--+
38 rows selected (0.691 seconds)
{noformat}


Hive table entity query response which shows ordering is maintained as above
{noformat}
curl http://admin:admin@localhost:21000/api/atlas/entities/2d63c256-aee1-47f6-abdc-9db472764585 | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6625    0  6625    0     0  65784      0 --:--:-- --:--:-- --:--:-- 66250
{
    "GUID": "2d63c256-aee1-47f6-abdc-9db472764585",
    "definition": {
        "id": {
            "id": "2d63c256-aee1-47f6-abdc-9db472764585",
            "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "state": "ACTIVE",
            "typeName": "hive_table",
            "version": 0
        },
        "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "traitNames": [],
        "traits": {},
        "typeName": "hive_table",
        "values": {
            "columns": [
                {
                    "id": {
                        "id": "f0115d35-c768-476b-917c-3a243085d1ff",
                        "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                        "state": "ACTIVE",
                        "typeName": "hive_column",
                        "version": 0
                    },
                    "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
                    "traitNames": [],
                    "traits": {},
                    "typeName": "hive_column",
                    "values": {
                        "comment": null,
                        "name": "viewtime",
                        "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.viewtime@primary",
                        "table": {
                            "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                            "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                            "state": "ACTIVE",
                            "typeName": "hive_table",
                            "version": 0
                        },
                        "type": "int"
                    }
                },
                {
                    "id": {
                        "id": "642b6b3a-1e5a-4a06-844e-6fd71ae036b2",
                        "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                        "state": "ACTIVE",
                        "typeName": "hive_column",
                        "version": 0
                    },
                    "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
                    "traitNames": [],
                    "traits": {},
                    "typeName": "hive_column",
                    "values": {
                        "comment": null,
                        "name": "userid",
                        "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.userid@primary",
                        "table": {
                            "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                            "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                            "state": "ACTIVE",
                            "typeName": "hive_table",
                            "version": 0
                        },
                        "type": "bigint"
                    }
                },
                {
                    "id": {
                        "id": "9b14560e-6471-4a2e-b495-1f08bfad37d3",
                        "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                        "state": "ACTIVE",
                        "typeName": "hive_column",
                        "version": 0
                    },
                    "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
                    "traitNames": [],
                    "traits": {},
                    "typeName": "hive_column",
                    "values": {
                        "comment": null,
                        "name": "page_url",
                        "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.page_url@primary",
                        "table": {
                            "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                            "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                            "state": "ACTIVE",
                            "typeName": "hive_table",
                            "version": 0
                        },
                        "type": "string"
                    }
                },
                {
                    "id": {
                        "id": "8ca2072f-2b98-4b19-9a17-2e3d125ebbd6",
                        "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                        "state": "ACTIVE",
                        "typeName": "hive_column",
                        "version": 0
                    },
                    "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
                    "traitNames": [],
                    "traits": {},
                    "typeName": "hive_column",
                    "values": {
                        "comment": null,
                        "name": "referrer_url",
                        "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.referrer_url@primary",
                        "table": {
                            "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                            "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                            "state": "ACTIVE",
                            "typeName": "hive_table",
                            "version": 0
                        },
                        "type": "string"
                    }
                },
                {
                    "id": {
                        "id": "effd4c89-8795-4e54-bd26-f7a182d58c79",
                        "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                        "state": "ACTIVE",
                        "typeName": "hive_column",
                        "version": 0
                    },
                    "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
                    "traitNames": [],
                    "traits": {},
                    "typeName": "hive_column",
                    "values": {
                        "comment": null,
                        "name": "ip",
                        "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.ip@primary",
                        "table": {
                            "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                            "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                            "state": "ACTIVE",
                            "typeName": "hive_table",
                            "version": 0
                        },
                        "type": "string"
                    }
                }
            ],
            "comment": null,
            "createTime": "2016-05-10T11:06:57.000Z",
            "db": {
                "id": "8abe3108-cd0a-42cb-a9f4-54b2256d9ef0",
                "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "state": "ACTIVE",
                "typeName": "hive_db",
                "version": 0
            },
            "description": null,
            "lastAccessTime": "2016-05-10T11:06:57.000Z",
            "name": "db2pbrscdldkm.table_pbrscdldkm@primary",
            "owner": "apathan",
            "parameters": {
                "last_modified_by": "apathan",
                "last_modified_time": "1462878417",
                "transient_lastDdlTime": "1462878417"
            },
            "partitionKeys": [
                {
                    "id": {
                        "id": "baa21d89-e899-4f97-8164-7c811cd0b44b",
                        "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                        "state": "ACTIVE",
                        "typeName": "hive_column",
                        "version": 0
                    },
                    "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
                    "traitNames": [],
                    "traits": {},
                    "typeName": "hive_column",
                    "values": {
                        "comment": null,
                        "name": "dt",
                        "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.dt@primary",
                        "table": {
                            "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                            "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                            "state": "ACTIVE",
                            "typeName": "hive_table",
                            "version": 0
                        },
                        "type": "string"
                    }
                },
                {
                    "id": {
                        "id": "b5e6ce71-21e4-4814-a5da-82b25a71d27c",
                        "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                        "state": "ACTIVE",
                        "typeName": "hive_column",
                        "version": 0
                    },
                    "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
                    "traitNames": [],
                    "traits": {},
                    "typeName": "hive_column",
                    "values": {
                        "comment": "partitioned columns comments.",
                        "name": "country",
                        "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.country@primary",
                        "table": {
                            "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                            "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                            "state": "ACTIVE",
                            "typeName": "hive_table",
                            "version": 0
                        },
                        "type": "string"
                    }
                }
            ],
            "retention": 0,
            "sd": {
                "id": {
                    "id": "6a7ce759-6dfa-4130-bde6-9bdeff64da39",
                    "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                    "state": "ACTIVE",
                    "typeName": "hive_storagedesc",
                    "version": 0
                },
                "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
                "traitNames": [],
                "traits": {},
                "typeName": "hive_storagedesc",
                "values": {
                    "bucketCols": null,
                    "compressed": false,
                    "inputFormat": "org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat",
                    "location": "hdfs://localhost:9000/user/hive/warehouse/db2pbrscdldkm.db/table_pbrscdldkm",
                    "numBuckets": -1,
                    "outputFormat": "org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat",
                    "parameters": null,
                    "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm@primary_storage",
                    "serdeInfo": {
                        "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                        "typeName": "hive_serde",
                        "values": {
                            "name": null,
                            "parameters": {
                                "serialization.format": "1"
                            },
                            "serializationLib": "org.apache.hadoop.hive.serde2.avro.AvroSerDe"
                        }
                    },
                    "sortCols": null,
                    "storedAsSubDirectories": false,
                    "table": {
                        "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                        "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                        "state": "ACTIVE",
                        "typeName": "hive_table",
                        "version": 0
                    }
                }
            },
            "tableName": "table_pbrscdldkm",
            "tableType": "MANAGED_TABLE",
            "temporary": false,
            "viewExpandedText": null,
            "viewOriginalText": null
        }
    },
    "requestId": "qtp1576861390-13 - 7211cfb0-ad04-48d9-948a-4e8dbab65e17"
}
{noformat}


Hive schema query
{noformat}
curl http://admin:admin@localhost:21000/api/atlas/lineage/hive/table/db2pbrscdldkm.table_pbrscdldkm@primary/schema | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2946    0  2946    0     0   4349      0 --:--:-- --:--:-- --:--:--  4351
{
    "requestId": "qtp1576861390-14 - b74c229b-02fc-4115-9f3f-187e949f3966",
    "results": {
        "dataType": {
            "attributeDefinitions": [
                {
                    "dataTypeName": "string",
                    "isComposite": false,
                    "isIndexable": true,
                    "isUnique": false,
                    "multiplicity": {
                        "isUnique": false,
                        "lower": 1,
                        "upper": 1
                    },
                    "name": "name",
                    "reverseAttributeName": null
                },
                {
                    "dataTypeName": "string",
                    "isComposite": false,
                    "isIndexable": true,
                    "isUnique": false,
                    "multiplicity": {
                        "isUnique": false,
                        "lower": 1,
                        "upper": 1
                    },
                    "name": "type",
                    "reverseAttributeName": null
                },
                {
                    "dataTypeName": "string",
                    "isComposite": false,
                    "isIndexable": true,
                    "isUnique": false,
                    "multiplicity": {
                        "isUnique": false,
                        "lower": 0,
                        "upper": 1
                    },
                    "name": "comment",
                    "reverseAttributeName": null
                },
                {
                    "dataTypeName": "hive_table",
                    "isComposite": false,
                    "isIndexable": true,
                    "isUnique": false,
                    "multiplicity": {
                        "isUnique": false,
                        "lower": 0,
                        "upper": 1
                    },
                    "name": "table",
                    "reverseAttributeName": "columns"
                }
            ],
            "hierarchicalMetaTypeName": "org.apache.atlas.typesystem.types.ClassType",
            "superTypes": [
                "Referenceable"
            ],
            "typeDescription": null,
            "typeName": "hive_column"
        },
        "query": "hive_table where (name = \"db2pbrscdldkm.table_pbrscdldkm@primary\") columns",
        "rows": [
            {
                "$id$": {
                    "$typeName$": "hive_column",
                    "id": "f0115d35-c768-476b-917c-3a243085d1ff",
                    "state": "ACTIVE",
                    "version": 0
                },
                "$typeName$": "hive_column",
                "comment": null,
                "name": "viewtime",
                "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.viewtime@primary",
                "table": {
                    "$typeName$": "hive_table",
                    "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                    "state": "ACTIVE",
                    "version": 0
                },
                "type": "int"
            },
            {
                "$id$": {
                    "$typeName$": "hive_column",
                    "id": "8ca2072f-2b98-4b19-9a17-2e3d125ebbd6",
                    "state": "ACTIVE",
                    "version": 0
                },
                "$typeName$": "hive_column",
                "comment": null,
                "name": "referrer_url",
                "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.referrer_url@primary",
                "table": {
                    "$typeName$": "hive_table",
                    "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                    "state": "ACTIVE",
                    "version": 0
                },
                "type": "string"
            },
            {
                "$id$": {
                    "$typeName$": "hive_column",
                    "id": "9b14560e-6471-4a2e-b495-1f08bfad37d3",
                    "state": "ACTIVE",
                    "version": 0
                },
                "$typeName$": "hive_column",
                "comment": null,
                "name": "page_url",
                "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.page_url@primary",
                "table": {
                    "$typeName$": "hive_table",
                    "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                    "state": "ACTIVE",
                    "version": 0
                },
                "type": "string"
            },
            {
                "$id$": {
                    "$typeName$": "hive_column",
                    "id": "effd4c89-8795-4e54-bd26-f7a182d58c79",
                    "state": "ACTIVE",
                    "version": 0
                },
                "$typeName$": "hive_column",
                "comment": null,
                "name": "ip",
                "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.ip@primary",
                "table": {
                    "$typeName$": "hive_table",
                    "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                    "state": "ACTIVE",
                    "version": 0
                },
                "type": "string"
            },
            {
                "$id$": {
                    "$typeName$": "hive_column",
                    "id": "642b6b3a-1e5a-4a06-844e-6fd71ae036b2",
                    "state": "ACTIVE",
                    "version": 0
                },
                "$typeName$": "hive_column",
                "comment": null,
                "name": "userid",
                "qualifiedName": "db2pbrscdldkm.table_pbrscdldkm.userid@primary",
                "table": {
                    "$typeName$": "hive_table",
                    "id": "2d63c256-aee1-47f6-abdc-9db472764585",
                    "state": "ACTIVE",
                    "version": 0
                },
                "type": "bigint"
            }
        ]
    },
    "tableName": "db2pbrscdldkm.table_pbrscdldkm@primary"
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)