You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/10/21 20:42:27 UTC

[jira] [Updated] (DRILL-3961) CONVERT_FROM should return INTEGER when converting from TINYINT/SMALLINT types

     [ https://issues.apache.org/jira/browse/DRILL-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victoria Markman updated DRILL-3961:
------------------------------------
    Priority: Critical  (was: Major)

> CONVERT_FROM should return INTEGER when converting from TINYINT/SMALLINT types
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-3961
>                 URL: https://issues.apache.org/jira/browse/DRILL-3961
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.1.0, 1.2.0
>            Reporter: Victoria Markman
>            Priority: Critical
>
> We've disabled SMALLINT, TINYINT SQL types because we had bugs that we could not fix before 1.0.0 Some things were left in sort of half implemented state and we should either fix up these things or re-implement TINYINT/SMALLINT types.
> * We can't cast to TINYINT/SMALLINT:
> {code}
> 0: jdbc:drill:schema=dfs> select cast(1 as tinyint) from sys.options limit 1;
> Error: UNSUPPORTED_OPERATION ERROR: TINYINT is not supported
> See Apache Drill JIRA: DRILL-1959
> [Error Id: c86bb4f5-4ea9-47cc-856e-ce1e96463477 on atsqa4-133.qa.lab:31010] (state=,code=0)
> 0: jdbc:drill:schema=dfs> select cast(1 as smallint) from sys.options limit 1;
> Error: UNSUPPORTED_OPERATION ERROR: SMALLINT is not supported
> See Apache Drill JIRA: DRILL-1959
> [Error Id: 14cc008a-532b-4f02-a9ad-6a54cf5dc6bc on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> * We hacked our implementation to read TINTINT/SMALLINT in hive as INTEGER: https://issues.apache.org/jira/browse/DRILL-3263
>    Which works. Now, if you try querying hive table with TINYINT/SMALLINT attribute, you get INTEGER in Drill.
> Hive table, notice c9/c10 columns are TINYINT/SMALLINT respectively:
> {code}
> hive> describe alltypes;
> OK
> c1                      int
> c2                      boolean
> c3                      double
> c4                      string
> c5                      array<int>
> c6                      map<int,string>
> c7                      map<string,string>
> c8                      struct<r:string,s:int,t:double>
> c9                      tinyint
> c10                     smallint
> c11                     float
> c12                     bigint
> c13                     array<array<string>>
> c14                     map<int,map<int,int>>
> c15                     struct<r:int,s:struct<a:int,b:string>>
> c16                     array<struct<m:map<string,string>,n:int>>
> c17                     timestamp
> c18                     decimal(10,0)
> c19                     binary
> c20                     date
> Time taken: 0.094 seconds, Fetched: 20 row(s)
> {code}
> Same table accessed from Drill:
> {code}
> 0: jdbc:drill:schema=dfs> use hive;
> +-------+-----------------------------------+
> |  ok   |              summary              |
> +-------+-----------------------------------+
> | true  | Default schema changed to [hive]  |
> +-------+-----------------------------------+
> 1 row selected (0.337 seconds)
> 0: jdbc:drill:schema=dfs> describe alltypes;
> +--------------+--------------------+--------------+
> | COLUMN_NAME  |     DATA_TYPE      | IS_NULLABLE  |
> +--------------+--------------------+--------------+
> | c1           | INTEGER            | YES          |
> | c2           | BOOLEAN            | YES          |
> | c3           | DOUBLE             | YES          |
> | c4           | CHARACTER VARYING  | YES          |
> | c5           | ARRAY              | NO           |
> | c6           | MAP                | NO           |
> | c7           | MAP                | NO           |
> | c8           | ROW                | NO           |
> | c9           | INTEGER            | YES          |
> | c10          | INTEGER            | YES          |
> | c11          | FLOAT              | YES          |
> | c12          | BIGINT             | YES          |
> | c13          | ARRAY              | NO           |
> | c14          | MAP                | NO           |
> | c15          | ROW                | NO           |
> | c16          | ARRAY              | NO           |
> | c17          | TIMESTAMP          | YES          |
> | c18          | DECIMAL            | YES          |
> | c19          | BINARY VARYING     | YES          |
> | c20          | DATE               | YES          |
> +--------------+--------------------+--------------+
> 20 rows selected (1.379 seconds)
> {code}
> * However, if you try reading parquet file with TINYINT/SMALLINT types (generated by impala for instance), CONVERT_FROM function still returns SMALLINT/TINYINT types.
> I can successfully read SMALLINT field from impala generated parquet file:
> {code}
> 0: jdbc:drill:schema=dfs> select asset_id from `impala.parquet` limit 1;
> +--------------+
> |   asset_id   |
> +--------------+
> | [B@22a3aec1  |
> +--------------+
> 1 row selected (0.298 seconds)
> {code}
> Using CONVERT_FROM function:
> {code}
> 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'SMALLINT') from `impala.parquet` limit 1;
> +---------+
> | EXPR$0  |
> +---------+
> | 19535   |
> +---------+
> 1 row selected (0.283 seconds)
> {code}
> Use of this function in an arithmetic expression results in an error:
> {code}
> 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'SMALLINT') + 1 from `impala.parquet` limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [castINT(SMALLINT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: b5e063ee-85e0-48f6-9ca2-09dd9d6a84e6 on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> Attempt to cast to integer fails for the same reason:
> {code}
> 0: jdbc:drill:schema=dfs> select CAST(CONVERT_FROM(asset_id, 'SMALLINT') as integer) from `impala.parquet` limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [castBIGINT(SMALLINT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 70820c35-f5fe-4229-8897-cafe0db5bdef on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> Trying to 'cheat it' and read as INT:
> {code}
> 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'INT') from `impala.parquet` limit 1;
> Error: SYSTEM ERROR: IllegalArgumentException: Wrong length 2(2-0) in the buffer 'OL', expected 4.
> Fragment 0:0
> [Error Id: 47e0a0d7-fc40-4f3f-8d68-4bed63ac0a60 on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> The only thing I can do in order to make use of this column in a query is to write it to a table:
> {code}
> 0: jdbc:drill:schema=dfs> create table x(c1) as select CONVERT_FROM(asset_id, 'SMALLINT') from `impala.parquet` limit 1;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 1                          |
> +-----------+----------------------------+
> 1 row selected (0.568 seconds)
> [Tue Oct 20 15:57:14] # ~/parquet-tools/parquet-schema 0_0_0.parquet
> message root {
>   optional int32 c1;
> }
> {code}
> *  Probably the same with Hive UDFs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)