You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Rahul Challapalli (JIRA)" <ji...@apache.org> on 2016/02/02 00:03:39 UTC

[jira] [Comment Edited] (DRILL-4337) Drill fails to read INT96 fields from hive generated parquet files

    [ https://issues.apache.org/jira/browse/DRILL-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127207#comment-15127207 ] 

Rahul Challapalli edited comment on DRILL-4337 at 2/1/16 11:03 PM:
-------------------------------------------------------------------

Failure1 :
{code}
org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet record reader.
Message:
Hadoop path: /drill/testdata/hive_storage/hive1_fewtypes_null/hive1_fewtypes_null.parquet
Total records read: 0
Mock records read: 0
Records to read: 21
Row group index: 0
Records in row group: 21
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message hive_schema {
  optional int32 int_col;
  optional int64 bigint_col;
  optional binary date_col (UTF8);
  optional binary time_col (UTF8);
  optional int96 timestamp_col;
  optional binary interval_col (UTF8);
  optional binary varchar_col (UTF8);
  optional float float_col;
  optional double double_col;
  optional boolean bool_col;
}
, metadata: {}}, blocks: [BlockMetaData{21, 1886 [ColumnMetaData{UNCOMPRESSED [int_col] INT32  [RLE, BIT_PACKED, PLAIN], 4}, ColumnMetaData{UNCOMPRESSED [bigint_col] INT64  [RLE, BIT_PACKED, PLAIN], 111}, ColumnMetaData{UNCOMPRESSED [date_col] BINARY  [RLE, BIT_PACKED, PLAIN], 298}, ColumnMetaData{UNCOMPRESSED [time_col] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 563}, ColumnMetaData{UNCOMPRESSED [timestamp_col] INT96  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 793}, ColumnMetaData{UNCOMPRESSED [interval_col] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1031}, ColumnMetaData{UNCOMPRESSED [varchar_col] BINARY  [RLE, BIT_PACKED, PLAIN], 1189}, ColumnMetaData{UNCOMPRESSED [float_col] FLOAT  [RLE, BIT_PACKED, PLAIN], 1543}, ColumnMetaData{UNCOMPRESSED [double_col] DOUBLE  [RLE, BIT_PACKED, PLAIN], 1654}, ColumnMetaData{UNCOMPRESSED [bool_col] BOOLEAN  [RLE, BIT_PACKED, PLAIN], 1851}]}]}
        at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:349) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:451) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:191) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71]
        at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) [hadoop-common-2.7.0-mapr-1506.jar:na]
        at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.lang.NegativeArraySizeException: null
        at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:97) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
        at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:66) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
        at org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readBytes(DictionaryValuesReader.java:86) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
        at org.apache.drill.exec.store.parquet.columnreaders.NullableFixedByteAlignedReaders$NullableFixedBinaryReader.readField(NullableFixedByteAlignedReaders.java:94) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:153) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:390) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:433) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        ... 19 common frames omitted
{code}


was (Author: rkins):
Failure1 :
{code}
org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet record reader.
Message:
Hadoop path: /drill/testdata/hive_storage/hive1_fewtypes_null/hive1_fewtypes_null.parquet
Total records read: 0
Mock records read: 0
Records to read: 21
Row group index: 0
Records in row group: 21
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message hive_schema {
  optional int32 int_col;
  optional int64 bigint_col;
  optional binary date_col (UTF8);
  optional binary time_col (UTF8);
  optional int96 timestamp_col;
  optional binary interval_col (UTF8);
  optional binary varchar_col (UTF8);
  optional float float_col;
  optional double double_col;
  optional boolean bool_col;
}
, metadata: {}}, blocks: [BlockMetaData{21, 1886 [ColumnMetaData{UNCOMPRESSED [int_col] INT32  [RLE, BIT_PACKED, PLAIN], 4}, ColumnMetaData{UNCOMPRESSED [bigint_col] INT64  [RLE, BIT_PACKED, PLAIN], 111}, ColumnMetaData{UNCOMPRESSED [date_col] BINARY  [RLE, BIT_PACKED, PLAIN], 298}, ColumnMetaData{UNCOMPRESSED [time_col] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 563}, ColumnMetaData{UNCOMPRESSED [timestamp_col] INT96  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 793}, ColumnMetaData{UNCOMPRESSED [interval_col] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1031}, ColumnMetaData{UNCOMPRESSED [varchar_col] BINARY  [RLE, BIT_PACKED, PLAIN], 1189}, ColumnMetaData{UNCOMPRESSED [float_col] FLOAT  [RLE, BIT_PACKED, PLAIN], 1543}, ColumnMetaData{UNCOMPRESSED [double_col] DOUBLE  [RLE, BIT_PACKED, PLAIN], 1654}, ColumnMetaData{UNCOMPRESSED [bool_col] BOOLEAN  [RLE, BIT_PACKED, PLAIN], 1851}]}]}
        at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:349) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:451) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:191) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71]
        at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) [hadoop-common-2.7.0-mapr-1506.jar:na]
        at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
{code}

> Drill fails to read INT96 fields from hive generated parquet files
> ------------------------------------------------------------------
>
>                 Key: DRILL-4337
>                 URL: https://issues.apache.org/jira/browse/DRILL-4337
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: hive1_fewtypes_null.parquet
>
>
> git.commit.id.abbrev=576271d
> Cluster : 2 nodes running MaprFS 4.1
> The data file used in the below table is generated from hive. Below is output from running the same query multiple times. 
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from hive1_fewtypes_null;
> Error: SYSTEM ERROR: NegativeArraySizeException
> Fragment 0:0
> [Error Id: 5517e983-ccae-4c96-b09c-30f331919e56 on qa-node191.qa.lab:31010] (state=,code=0)
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from hive1_fewtypes_null;
> Error: SYSTEM ERROR: IllegalArgumentException: Reading past RLE/BitPacking stream.
> Fragment 0:0
> [Error Id: 94ed5996-d2ac-438d-b460-c2d2e41bdcc3 on qa-node191.qa.lab:31010] (state=,code=0)
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from hive1_fewtypes_null;
> Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0
> Fragment 0:0
> [Error Id: 41dca093-571e-49e5-a2ab-fd69210b143d on qa-node191.qa.lab:31010] (state=,code=0)
> 0: jdbc:drill:zk=10.10.100.190:5181> select timestamp_col from hive1_fewtypes_null;
> +----------------+
> | timestamp_col  |
> +----------------+
> | null           |
> | [B@7c766115    |
> | [B@3fdfe989    |
> | null           |
> | [B@55d4222     |
> | [B@2da0c8ee    |
> | [B@16e798a9    |
> | [B@3ed78afe    |
> | [B@38e649ed    |
> | [B@16ff83ca    |
> | [B@61254e91    |
> | [B@5849436a    |
> | [B@31e9116e    |
> | [B@3c77665b    |
> | [B@42e0ff60    |
> | [B@419e19ed    |
> | [B@72b83842    |
> | [B@1c75afe5    |
> | [B@726ef1fb    |
> | [B@51d0d06e    |
> | [B@64240fb8    |
> +----------------
> {code}
> Attached the log, hive ddl used to generate the parquet file and the parquet file itself



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)