You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Vitalii Diravka (JIRA)" <ji...@apache.org> on 2018/07/12 15:45:00 UTC
[jira] [Resolved] (DRILL-4337) Drill fails to read INT96 fields from hive generated parquet files

     [ https://issues.apache.org/jira/browse/DRILL-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vitalii Diravka resolved DRILL-4337.
------------------------------------
    Resolution: Cannot Reproduce

It isn't reproduced for me anymore.

Reading parquet file with dfs file plugin:
{code:java}
0: jdbc:drill:zk=local> select timestamp_col from dfs.`/home/vitalii/Downloads/hive1_fewtypes_null.parquet` limit 2;
+----------------+
| timestamp_col  |
+----------------+
| null           |
| [B@2dc9e445    |
+----------------+
0: jdbc:drill:zk=local> select CONVERT_FROM(timestamp_col, 'TIMESTAMP_IMPALA') from dfs.`/home/vitalii/Downloads/hive1_fewtypes_null.parquet` limit 2;
+------------------------+
|         EXPR$0         |
+------------------------+
| null                   |
| 1997-01-02 02:00:00.0  |
+------------------------+
2 rows selected (0.129 seconds)
0: jdbc:drill:zk=local> set `store.parquet.reader.int96_as_timestamp` = true;
+-------+---------------------------------------------------+
|  ok   |                      summary                      |
+-------+---------------------------------------------------+
| true  | store.parquet.reader.int96_as_timestamp updated.  |
+-------+---------------------------------------------------+
1 row selected (0.306 seconds)
0: jdbc:drill:zk=local> select timestamp_col from dfs.`/home/vitalii/Downloads/hive1_fewtypes_null.parquet` limit 2;
+------------------------+
|     timestamp_col      |
+------------------------+
| null                   |
| 1997-01-02 02:00:00.0  |
+------------------------+
2 rows selected (0.218 seconds)
{code}
This is reading of the Hive table for the same parquet file:
{code:java}
hive> CREATE TABLE nullable_types (int_col INT, bigint_col BIGINT, date_col STRING, time_col STRING, timestamp_col TIMESTAMP, interval_col STRING, varchar_col STRING, float_col FLOAT, double_col DOUBLE, bool_col BOOLEAN) STORED AS PARQUET;
OK

hive> select * from nullable_types limit 2;
OK
1	98723980547	NULL	00:00:00	NULL	P18582D	jllkjsdhfg	2345.33	NULL	false
NULL	24509823475	NULL	01:00:00	1997-01-02 00:00:00	P1DT9045S	jhgduitweriuoert	3243.32	664522.332	true
Time taken: 0.091 seconds, Fetched: 2 row(s)

0: jdbc:drill:> select * from hive.nullable_types limit 2;
+----------+--------------+-----------+-----------+------------------------+---------------+-------------------+------------+-------------+-----------+
| int_col  |  bigint_col  | date_col  | time_col  |     timestamp_col      | interval_col  |    varchar_col    | float_col  | double_col  | bool_col  |
+----------+--------------+-----------+-----------+------------------------+---------------+-------------------+------------+-------------+-----------+
| 1        | 98723980547  | null      | 00:00:00  | null                   | P18582D       | jllkjsdhfg        | 2345.33    | null        | false     |
| null     | 24509823475  | null      | 01:00:00  | 1997-01-02 00:00:00.0  | P1DT9045S     | jhgduitweriuoert  | 3243.32    | 664522.332  | true      |
+----------+--------------+-----------+-----------+------------------------+---------------+-------------------+------------+-------------+-----------+
{code}
The same result is obtained without limit operator.
I have performed the query several times. 
Looks like it was resolved in one of the tickets: DRILL-4373 or DRILL-6016.

> Drill fails to read INT96 fields from hive generated parquet files
> ------------------------------------------------------------------
>
>                 Key: DRILL-4337
>                 URL: https://issues.apache.org/jira/browse/DRILL-4337
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Rahul Challapalli
>            Assignee: Vitalii Diravka
>            Priority: Blocker
>             Fix For: 1.15.0
>
>         Attachments: hive1_fewtypes_null.parquet
>
>
> git.commit.id.abbrev=576271d
> Cluster : 2 nodes running MaprFS 4.1
> The data file used in the below table is generated from hive. Below is output from running the same query multiple times. 
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from hive1_fewtypes_null;
> Error: SYSTEM ERROR: NegativeArraySizeException
> Fragment 0:0
> [Error Id: 5517e983-ccae-4c96-b09c-30f331919e56 on qa-node191.qa.lab:31010] (state=,code=0)
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from hive1_fewtypes_null;
> Error: SYSTEM ERROR: IllegalArgumentException: Reading past RLE/BitPacking stream.
> Fragment 0:0
> [Error Id: 94ed5996-d2ac-438d-b460-c2d2e41bdcc3 on qa-node191.qa.lab:31010] (state=,code=0)
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from hive1_fewtypes_null;
> Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0
> Fragment 0:0
> [Error Id: 41dca093-571e-49e5-a2ab-fd69210b143d on qa-node191.qa.lab:31010] (state=,code=0)
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from hive1_fewtypes_null;
> +----------------+
> | timestamp_col  |
> +----------------+
> | null           |
> | [B@7c766115    |
> | [B@3fdfe989    |
> | null           |
> | [B@55d4222     |
> | [B@2da0c8ee    |
> | [B@16e798a9    |
> | [B@3ed78afe    |
> | [B@38e649ed    |
> | [B@16ff83ca    |
> | [B@61254e91    |
> | [B@5849436a    |
> | [B@31e9116e    |
> | [B@3c77665b    |
> | [B@42e0ff60    |
> | [B@419e19ed    |
> | [B@72b83842    |
> | [B@1c75afe5    |
> | [B@726ef1fb    |
> | [B@51d0d06e    |
> | [B@64240fb8    |
> +----------------
> {code}
> Attached the log, hive ddl used to generate the parquet file and the parquet file itself



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)