You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vitalii Diravka (JIRA)" <ji...@apache.org> on 2017/12/07 09:35:00 UTC

[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark

    [ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281561#comment-16281561 ] 

Vitalii Diravka commented on DRILL-6016:
----------------------------------------

Interesting dataset.
Drill reads INT96 by default as VARBINARY: https://drill.apache.org/docs/parquet-format/#sql-data-types-to-parquet
But with provided dataset it returns an error. Even with explicit converting it returns an error:
{code}
0: jdbc:drill:zk=local> select CONVERT_FROM(run_date, 'TIMESTAMP_IMPALA') from dfs.`/home/vitalii/Downloads/result/parquet/latest/part-r-00000-0c44161e-49e7-4b40-b4ab-c3d8e492bf33.snappy.parquet` limit 1; 
Error: DATA_READ ERROR: Error reading from Parquet file

File:  /home/vitalii/Downloads/result/parquet/latest/part-r-00000-0c44161e-49e7-4b40-b4ab-c3d8e492bf33.snappy.parquet
Column:  run_date
Row Group Start:  5523
Fragment 0:0
{code}

But the schema looks good:
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar schema /home/vitalii/Downloads/result/parquet/latest/part-r-00000-0c44161e-49e7-4b40-b4ab-c3d8e492bf33.snappy.parquet
message spark_schema {
  optional binary article_no (UTF8);
  optional binary qty (UTF8);
  required int96 run_date;
}
{code}

> Error reading INT96 created by Apache Spark
> -------------------------------------------
>
>                 Key: DRILL-6016
>                 URL: https://issues.apache.org/jira/browse/DRILL-6016
>             Project: Apache Drill
>          Issue Type: Bug
>         Environment: Drill 1.11
>            Reporter: Rahul Raj
>
> Hi,
> I am getting the error - SYSTEM ERROR : ClassCastException: org.apache.drill.exec.vector.TimeStampVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark INT96 datetime field on Drill 1.11 in spite of setting the property store.parquet.reader.int96_as_timestamp to  true.
> I believe this was fixed in drill 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong.
> I have attached the dataset at https://github.com/rajrahul/files/blob/master/result.tar.gz



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)