You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Dave Challis (JIRA)" <ji...@apache.org> on 2018/08/06 13:27:00 UTC

[jira] [Comment Edited] (DRILL-6670) Error in parquet record reader - previously readable file fails to be read in 1.14

    [ https://issues.apache.org/jira/browse/DRILL-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570201#comment-16570201 ] 

Dave Challis edited comment on DRILL-6670 at 8/6/18 1:26 PM:
-------------------------------------------------------------

From further digging in the logs, looks like this is an issue related to Parquet to Drill type conversion, this is the relevant stack trace from the logs:

{code}
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet record reader.
Message: Failure in setting up reader
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
  optional binary name (UTF8);
  optional binary creation_parameters (UTF8);
  optional int64 creation_date (TIMESTAMP_MICROS);
  optional int32 data_version;
  optional int32 schema_version;
}
, metadata: {pandas={"index_columns": [], "column_indexes": [], "columns": [{"name": "name", "field_name": "name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_parameters", "field_name": "creation_parameters", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_date", "field_name": "creation_date", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "data_version", "field_name": "data_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}, {"name": "schema_version", "field_name": "schema_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}], "pandas_version": "0.22.0"}}}, blocks: [BlockMetaData{1, 8394 [ColumnMetaData{SNAPPY [name] optional binary name (UTF8)  [PLAIN, RLE], 4}, ColumnMetaData{SNAPPY [creation_parameters] optional binary creation_parameters (UTF8)  [PLAIN, RLE], 162}, ColumnMetaData{SNAPPY [creation_date] optional int64 creation_date (TIMESTAMP_MICROS)  [PLAIN, RLE], 14197}, ColumnMetaData{SNAPPY [data_version] optional int32 data_version  [PLAIN, RLE], 14341}, ColumnMetaData{SNAPPY [schema_version] optional int32 schema_version  [PLAIN, RLE], 14456}]}]}
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleException(ParquetRecordReader.java:271) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup(ParquetRecordReader.java:255) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas(ScanBatch.java:251) [drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:169) [drill-java-exec-1.14.0.jar:1.14.0]
	... 40 common frames omitted
Caused by: java.lang.UnsupportedOperationException: unsupported type: INT64 TIMESTAMP_MICROS
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter.getMinorType(ParquetToDrillTypeConverter.java:70) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter.toMajorType(ParquetToDrillTypeConverter.java:128) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetColumnMetadata.resolveDrillType(ParquetColumnMetadata.java:61) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetSchema.loadParquetSchema(ParquetSchema.java:132) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetSchema.buildSchema(ParquetSchema.java:115) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup(ParquetRecordReader.java:250) ~[drill-java-exec-1.14.0.jar:1.14.0]
	... 42 common frames omitted
{code}

I couldn't see anything related to this in the release notes.


was (Author: suicas):
From further digging in the logs, looks like this is an issue related to Parquet to Drill type conversion, this is the relevant stack trace from the logs:

{code}
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet record reader.
Message: Failure in setting up reader
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
  optional binary name (UTF8);
  optional binary creation_parameters (UTF8);
  optional int64 creation_date (TIMESTAMP_MICROS);
  optional int32 data_version;
  optional int32 schema_version;
}
, metadata: {pandas={"index_columns": [], "column_indexes": [], "columns": [{"name": "name", "field_name": "name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_parameters", "field_name": "creation_parameters", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_date", "field_name": "creation_date", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "data_version", "field_name": "data_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}, {"name": "schema_version", "field_name": "schema_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}], "pandas_version": "0.22.0"}}}, blocks: [BlockMetaData{1, 8394 [ColumnMetaData{SNAPPY [name] optional binary name (UTF8)  [PLAIN, RLE], 4}, ColumnMetaData{SNAPPY [creation_parameters] optional binary creation_parameters (UTF8)  [PLAIN, RLE], 162}, ColumnMetaData{SNAPPY [creation_date] optional int64 creation_date (TIMESTAMP_MICROS)  [PLAIN, RLE], 14197}, ColumnMetaData{SNAPPY [data_version] optional int32 data_version  [PLAIN, RLE], 14341}, ColumnMetaData{SNAPPY [schema_version] optional int32 schema_version  [PLAIN, RLE], 14456}]}]}
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleException(ParquetRecordReader.java:271) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup(ParquetRecordReader.java:255) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas(ScanBatch.java:251) [drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:169) [drill-java-exec-1.14.0.jar:1.14.0]
	... 40 common frames omitted
Caused by: java.lang.UnsupportedOperationException: unsupported type: INT64 TIMESTAMP_MICROS
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter.getMinorType(ParquetToDrillTypeConverter.java:70) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter.toMajorType(ParquetToDrillTypeConverter.java:128) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetColumnMetadata.resolveDrillType(ParquetColumnMetadata.java:61) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetSchema.loadParquetSchema(ParquetSchema.java:132) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetSchema.buildSchema(ParquetSchema.java:115) ~[drill-java-exec-1.14.0.jar:1.14.0]
	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup(ParquetRecordReader.java:250) ~[drill-java-exec-1.14.0.jar:1.14.0]
	... 42 common frames omitted
{code}

> Error in parquet record reader - previously readable file fails to be read in 1.14
> ----------------------------------------------------------------------------------
>
>                 Key: DRILL-6670
>                 URL: https://issues.apache.org/jira/browse/DRILL-6670
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.14.0
>            Reporter: Dave Challis
>            Priority: Major
>
> Parquet file which was generated by PyArrow was readable in Apache Drill 1.12 and 1.13, but fails to be read with 1.14.
> Running the query "SELECT * FROM dfs.`foo.parquet`" results in the following error message from the Drill web query UI:
> {code}
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: INTERNAL_ERROR ERROR: Error in parquet record reader. Message: Failure in setting up reader Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema { optional binary name (UTF8); optional binary creation_parameters (UTF8); optional int64 creation_date (TIMESTAMP_MICROS); optional int32 data_version; optional int32 schema_version; } , metadata: {pandas={"index_columns": [], "column_indexes": [], "columns": [{"name": "name", "field_name": "name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_parameters", "field_name": "creation_parameters", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_date", "field_name": "creation_date", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "data_version", "field_name": "data_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}, {"name": "schema_version", "field_name": "schema_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}], "pandas_version": "0.22.0"}}}, blocks: [BlockMetaData{1, 27142 [ColumnMetaData{SNAPPY [name] optional binary name (UTF8) [PLAIN, RLE], 4}, ColumnMetaData{SNAPPY [creation_parameters] optional binary creation_parameters (UTF8) [PLAIN, RLE], 252}, ColumnMetaData{SNAPPY [creation_date] optional int64 creation_date (TIMESTAMP_MICROS) [PLAIN, RLE], 46334}, ColumnMetaData{SNAPPY [data_version] optional int32 data_version [PLAIN, RLE], 46478}, ColumnMetaData{SNAPPY [schema_version] optional int32 schema_version [PLAIN, RLE], 46593}]}]} Fragment 0:0 [Error Id: bdb2e4d5-5982-4cc6-b95e-244782f827d2 on f9d0456cddd2:31010] 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)