You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Joseph Yen (JIRA)" <ji...@apache.org> on 2017/11/29 08:30:00 UTC

[jira] [Created] (HIVE-18176) The response of GetResultSetMetadata is inconsistent with TCLIService.thrift for complex types

Joseph Yen created HIVE-18176:
---------------------------------

             Summary: The response of GetResultSetMetadata is inconsistent with TCLIService.thrift for complex types
                 Key: HIVE-18176
                 URL: https://issues.apache.org/jira/browse/HIVE-18176
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
         Environment: HDP Hive 1.2.1
PyHive master(commit b68e1a8dcc9917feb10281af70ff6bd29c764cdd)
            Reporter: Joseph Yen


I was trying to add decimal, timestamp, date, array, map type support to PyHive DBAPI. In order to parse the result set correctly, I have to know the result set schema for each SELECT. For simple types(integer, string, timestamp, decimal, …), it’s not a problem. I can get all information by calling HiveServer2.GetResultSetMetadata. But for complex types(array, map, struct), the nested type information is missing. I can’t find a way to know if it’s an integer array or a string array from the response of GetResultSetMetadata.

According to [TCLIService.thrift|https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L147-L188]
, recursively defined types such as {{array<int>}}, {{map<int, string>}} should be described by {{TTypeEntry.arrayEntry}}, {{TTypeEntry.mapEntry}} rather than {{TTypeEntry.primitivyEntry}} in the first element of {{TypeDesc.types}}. The nested types should be reside in {{TypeDesc.types}} as following elements, and be pointed from the first element.

However, when I actually called {{GetResultSetMetadata}} for the query {{SELECT array(1, 2, 3)}}, I got just a single {{TTypeEntry.primitivyEntry}} element in {{TypeDesc.types}} with {{TPrimitiveTypeEntry.type = ARRAY_TYPE}} 

This response violated both the descriptions in TCLIService.thrift —
bq. [“TTypeDesc employs a type list that maps integer “pointers” to TTypeEntry objects”|https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L147-L188] 
and 
bq. [“The primitive type token. This must satisfy the condition that type is in the PRIMITIVE_TYPES set.”|https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L210-L215]

----
I tried the following script.

{code:sql}
create temporary table dummy(a int);
insert into table dummy values (1), (2), (3);
create temporary table tt(a int,  b string, c map<INT, ARRAY<string>>);
insert into table tt select 1, 'a', map(3, array('a','b','c')) from dummy limit 1;
select * from tt;
{code}

And called {{GetResultSetMetadata}} right after executing the SELECT query.
The value of {{response.schema.columns}} was

{code:javascript}
[TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc(
  types=[
    TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=1, comment=None),
 TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[
    TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=2, comment=None),
 TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[
    TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=11, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=3, comment=None)]
{code}

However, according to the thrift file, it should be
{code:javascript}
[TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc(types=[
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=1, comment=None),
 TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=2, comment=None),
 TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[
  TTypeEntry(primitiveEntry=None, arrayEntry=None, mapEntry=TMapTypeEntry(keyTypePtr=1, valueTypePtr=2), structEntry=None, unionEntry=None, userDefinedTypeEntry=None),
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None),
  TTypeEntry(primitiveEntry=None, arrayEntry=TArrayTypeEntry(objectTypePtr=3), mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None),
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)
]), position=3, comment=None)]
{code}

----
I found the related function in hive codebase.
https://github.com/apache/hive/blob/release-1.2.1/service/src/java/org/apache/hive/service/cli/TypeDescriptor.java#L66-L76
It seems that this function always put {{TPrimitiveTypeEntry}} to {{TTypeDesc.type}}, even for complex types(like array and map) which is inconsistent with the thirft file.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)