You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Dave Challis (JIRA)" <ji...@apache.org> on 2018/08/06 13:21:00 UTC
[jira] [Updated] (DRILL-6670) Error in parquet record reader -
previously readable file fails to be read in 1.14
[ https://issues.apache.org/jira/browse/DRILL-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dave Challis updated DRILL-6670:
--------------------------------
Description:
Parquet file which was generated by PyArrow was readable in Apache Drill 1.12 and 1.13, but fails to be read with 1.14.
Running the query "SELECT * FROM dfs.`foo.parquet`" results in the following error message from the Drill web query UI:
{code}
Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: INTERNAL_ERROR ERROR: Error in parquet record reader. Message: Failure in setting up reader Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema { optional binary name (UTF8); optional binary creation_parameters (UTF8); optional int64 creation_date (TIMESTAMP_MICROS); optional int32 data_version; optional int32 schema_version; } , metadata: {pandas={"index_columns": [], "column_indexes": [], "columns": [{"name": "name", "field_name": "name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_parameters", "field_name": "creation_parameters", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_date", "field_name": "creation_date", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "data_version", "field_name": "data_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}, {"name": "schema_version", "field_name": "schema_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}], "pandas_version": "0.22.0"}}}, blocks: [BlockMetaData{1, 27142 [ColumnMetaData{SNAPPY [name] optional binary name (UTF8) [PLAIN, RLE], 4}, ColumnMetaData{SNAPPY [creation_parameters] optional binary creation_parameters (UTF8) [PLAIN, RLE], 252}, ColumnMetaData{SNAPPY [creation_date] optional int64 creation_date (TIMESTAMP_MICROS) [PLAIN, RLE], 46334}, ColumnMetaData{SNAPPY [data_version] optional int32 data_version [PLAIN, RLE], 46478}, ColumnMetaData{SNAPPY [schema_version] optional int32 schema_version [PLAIN, RLE], 46593}]}]} Fragment 0:0 [Error Id: bdb2e4d5-5982-4cc6-b95e-244782f827d2 on f9d0456cddd2:31010]
{code}
was:
Parquet file which was generated by PyArrow was readable in Apache Drill 1.12 and 1.13, but fails to be read with 1.14.
Running the query "SELECT * FROM dfs.`foo.parquet`" results in the following error message from the Drill web query UI:
{code}
{"code":500,"message":"SQL error while querying Drill DB Failed to create prepared statement: INTERNAL_ERROR ERROR: Error in parquet record reader.\nMessage: Failure in setting up reader\nParquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {\n optional binary name (UTF8);\n optional binary creation_parameters (UTF8);\n optional int64 creation_date (TIMESTAMP_MICROS);\n optional int32 data_version;\n optional int32 schema_version;\n}\n, metadata: {pandas={\"index_columns\": [], \"column_indexes\": [], \"columns\": [{\"name\": \"name\", \"field_name\": \"name\", \"pandas_type\": \"unicode\", \"numpy_type\": \"object\", \"metadata\": null}, {\"name\": \"creation_parameters\", \"field_name\": \"creation_parameters\", \"pandas_type\": \"unicode\", \"numpy_type\": \"object\", \"metadata\": null}, {\"name\": \"creation_date\", \"field_name\": \"creation_date\", \"pandas_type\": \"datetime\", \"numpy_type\": \"datetime64[ns]\", \"metadata\": null}, {\"name\": \"data_version\", \"field_name\": \"data_version\", \"pandas_type\": \"int32\", \"numpy_type\": \"int32\", \"metadata\": null}, {\"name\": \"schema_version\", \"field_name\": \"schema_version\", \"pandas_type\": \"int32\", \"numpy_type\": \"int32\", \"metadata\": null}], \"pandas_version\": \"0.22.0\"}}}, blocks: [BlockMetaData{1, 27142 [ColumnMetaData{SNAPPY [name] optional binary name (UTF8) [PLAIN, RLE], 4}, ColumnMetaData{SNAPPY [creation_parameters] optional binary creation_parameters (UTF8) [PLAIN, RLE], 252}, ColumnMetaData{SNAPPY [creation_date] optional int64 creation_date (TIMESTAMP_MICROS) [PLAIN, RLE], 46334}, ColumnMetaData{SNAPPY [data_version] optional int32 data_version [PLAIN, RLE], 46478}, ColumnMetaData{SNAPPY [schema_version] optional int32 schema_version [PLAIN, RLE], 46593}]}]}\n\nFragment 0:0\n\n[Error Id: 7c76ae97-03e3-4fab-9125-ec19fc572bf5 on f9d0456cddd2:31010]"}
{code}
> Error in parquet record reader - previously readable file fails to be read in 1.14
> ----------------------------------------------------------------------------------
>
> Key: DRILL-6670
> URL: https://issues.apache.org/jira/browse/DRILL-6670
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.14.0
> Reporter: Dave Challis
> Priority: Major
>
> Parquet file which was generated by PyArrow was readable in Apache Drill 1.12 and 1.13, but fails to be read with 1.14.
> Running the query "SELECT * FROM dfs.`foo.parquet`" results in the following error message from the Drill web query UI:
> {code}
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: INTERNAL_ERROR ERROR: Error in parquet record reader. Message: Failure in setting up reader Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema { optional binary name (UTF8); optional binary creation_parameters (UTF8); optional int64 creation_date (TIMESTAMP_MICROS); optional int32 data_version; optional int32 schema_version; } , metadata: {pandas={"index_columns": [], "column_indexes": [], "columns": [{"name": "name", "field_name": "name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_parameters", "field_name": "creation_parameters", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_date", "field_name": "creation_date", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "data_version", "field_name": "data_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}, {"name": "schema_version", "field_name": "schema_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}], "pandas_version": "0.22.0"}}}, blocks: [BlockMetaData{1, 27142 [ColumnMetaData{SNAPPY [name] optional binary name (UTF8) [PLAIN, RLE], 4}, ColumnMetaData{SNAPPY [creation_parameters] optional binary creation_parameters (UTF8) [PLAIN, RLE], 252}, ColumnMetaData{SNAPPY [creation_date] optional int64 creation_date (TIMESTAMP_MICROS) [PLAIN, RLE], 46334}, ColumnMetaData{SNAPPY [data_version] optional int32 data_version [PLAIN, RLE], 46478}, ColumnMetaData{SNAPPY [schema_version] optional int32 schema_version [PLAIN, RLE], 46593}]}]} Fragment 0:0 [Error Id: bdb2e4d5-5982-4cc6-b95e-244782f827d2 on f9d0456cddd2:31010]
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)