You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/11/28 02:01:58 UTC
[jira] [Created] (DRILL-5075) Tests complain about Parquet metadata
parse errors in Drill-created files
Paul Rogers created DRILL-5075:
----------------------------------
Summary: Tests complain about Parquet metadata parse errors in Drill-created files
Key: DRILL-5075
URL: https://issues.apache.org/jira/browse/DRILL-5075
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Paul Rogers
Priority: Minor
The test {{TestParquetWriter.testAllScalarTypes}} seems to create a Parquet file, then read it using the "new" Parquet reader. However, the test throws the following assertion (though the test still succeeds.)
Note that the exception does _not_ occur if we run the single test function by itself. It only occurs when run as part of the entire test class, suggesting an interaction between tests.
When run stand-alone, another behavior occurs. When the test is complete, and the Drillbit shuts down, only then does Parquet log a bunch of "ColumnChunkPageWriteStore: written" messages followed by:
{code}
WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by is null or empty! See PARQUET-251 and PARQUET-297
{code}
Are we leaving a file open that is getting flushed only on shut-down?
Full error when the test runs in the entire suite:
{code}
PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr
org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr using format: (.+) version ((.*) )?\(build ?(.*)\)
at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:66)
at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:264)
at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:568)
at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:545)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:455)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412)
at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:381)
at org.apache.drill.exec.store.parquet.Metadata.access$0(Metadata.java:379)
at org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:316)
at org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:1)
at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:56)
at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:122)
at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:278)
at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:257)
at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:242)
at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:118)
at org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:733)
at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:230)
at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:190)
at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:169)
at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:1)
at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:145)
at org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:103)
at org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85)
at org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:65)
at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:404)
at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:240)
at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:290)
at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:168)
at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:123)
at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:97)
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1008)
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)