You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Joseph Yen <jo...@gmail.com> on 2017/11/27 04:03:39 UTC

The response of GetResultSetMetadata is inconsistent with TCLIService.thrift for complex types

I was trying to add decimal, timestamp, date, array, map type support to
PyHive DBAPI. In order to parse the result set correctly, I have to know
the result set schema for each SELECT. For simple types(integer, string,
timestamp, decimal, …), it’s not a problem. I can get all information by
calling HiveServer2.GetResultSetMetadata. But for complex types(array, map,
struct), the nested type information is missing. I can’t find a way to know
if it’s an integer array or a string array.

According to TCLIService.thrift
<https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L147-L188>
, recursively defined types such as array<int>, map<int, string> should be
described by TTypeEntry.arrayEntry, TTypeEntry.mapEntry rather than
TTypeEntry.primitivyEntry
in the first element ofTypeDesc.types. The nested types should be
reside inTypeDesc.types`
as following elements, and be pointed from the first element.

However, I got just a single TTypeEntry.primitivyEntry in TypeDesc.types
with TPrimitiveTypeEntry.type = ARRAY_TYPE when I actually called
GetResultSetMetadata for the query SELECT array(1, 2, 3) .

It violated both the descriptions of “TTypeDesc employs a type list that
maps
integer “pointers” to TTypeEntry objects”
<https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L147-L188>
and “The primitive type token. This must satisfy the condition that type is
in the PRIMITIVE_TYPES set.”
<https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L210-L215>

I tried the following script.

create temporary table dummy(a int);insert into table dummy values
(1), (2), (3);create temporary table tt(a int,  b string, c map<INT,
ARRAY<string>>);insert into table tt select 1, 'a', map(3,
array('a','b','c')) from dummy limit 1;select * from tt;

And called GetResultSetMetadata right after executing the SELECT query.
The value of response.schema.columns was

[TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc(
  types=[
    TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3,
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]),
position=1, comment=None),
 TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[
    TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7,
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]),
position=2, comment=None),
 TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[
    TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=11,
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]),
position=3, comment=None)]

However, according to the thrift file, it should be

[TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc(types=[
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3,
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]),
position=1, comment=None),
 TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7,
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]),
position=2, comment=None),
 TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[
  TTypeEntry(primitiveEntry=None, arrayEntry=None,
mapEntry=TMapTypeEntry(keyTypePtr=1, valueTypePtr=2),
structEntry=None, unionEntry=None, userDefinedTypeEntry=None),
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3,
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None),
  TTypeEntry(primitiveEntry=None,
arrayEntry=TArrayTypeEntry(objectTypePtr=3), mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None),
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3,
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)
]), position=3, comment=None)]

I found the related function in hive codebase.
https://github.com/apache/hive/blob/release-1.2.1/service/src/java/org/apache/hive/service/cli/TypeDescriptor.java#L66-L76
It seems that this function always put TPrimitiveTypeEntry to TTypeDesc.type,
even for complex type like array and map which is inconsistent with the
thirft file.
​