You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vitalii Diravka (JIRA)" <ji...@apache.org> on 2017/12/07 09:35:00 UTC
[jira] [Commented] (DRILL-6016) Error reading INT96 created by
Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281561#comment-16281561 ]
Vitalii Diravka commented on DRILL-6016:
----------------------------------------
Interesting dataset.
Drill reads INT96 by default as VARBINARY: https://drill.apache.org/docs/parquet-format/#sql-data-types-to-parquet
But with provided dataset it returns an error. Even with explicit converting it returns an error:
{code}
0: jdbc:drill:zk=local> select CONVERT_FROM(run_date, 'TIMESTAMP_IMPALA') from dfs.`/home/vitalii/Downloads/result/parquet/latest/part-r-00000-0c44161e-49e7-4b40-b4ab-c3d8e492bf33.snappy.parquet` limit 1;
Error: DATA_READ ERROR: Error reading from Parquet file
File: /home/vitalii/Downloads/result/parquet/latest/part-r-00000-0c44161e-49e7-4b40-b4ab-c3d8e492bf33.snappy.parquet
Column: run_date
Row Group Start: 5523
Fragment 0:0
{code}
But the schema looks good:
{code}
vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar schema /home/vitalii/Downloads/result/parquet/latest/part-r-00000-0c44161e-49e7-4b40-b4ab-c3d8e492bf33.snappy.parquet
message spark_schema {
optional binary article_no (UTF8);
optional binary qty (UTF8);
required int96 run_date;
}
{code}
> Error reading INT96 created by Apache Spark
> -------------------------------------------
>
> Key: DRILL-6016
> URL: https://issues.apache.org/jira/browse/DRILL-6016
> Project: Apache Drill
> Issue Type: Bug
> Environment: Drill 1.11
> Reporter: Rahul Raj
>
> Hi,
> I am getting the error - SYSTEM ERROR : ClassCastException: org.apache.drill.exec.vector.TimeStampVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark INT96 datetime field on Drill 1.11 in spite of setting the property store.parquet.reader.int96_as_timestamp to true.
> I believe this was fixed in drill 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong.
> I have attached the dataset at https://github.com/rajrahul/files/blob/master/result.tar.gz
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)