You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/11/28 02:01:58 UTC

[jira] [Created] (DRILL-5075) Tests complain about Parquet metadata parse errors in Drill-created files

Paul Rogers created DRILL-5075:
----------------------------------

             Summary: Tests complain about Parquet metadata parse errors in Drill-created files
                 Key: DRILL-5075
                 URL: https://issues.apache.org/jira/browse/DRILL-5075
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers
            Priority: Minor


The test {{TestParquetWriter.testAllScalarTypes}} seems to create a Parquet file, then read it using the "new" Parquet reader. However, the test throws the following assertion (though the test still succeeds.)

Note that the exception does _not_ occur if we run the single test function by itself. It only occurs when run as part of the entire test class, suggesting an interaction between tests.

When run stand-alone, another behavior occurs. When the test is complete, and the Drillbit shuts down, only then does Parquet log a bunch of "ColumnChunkPageWriteStore: written" messages followed by:

{code}
WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by is null or empty! See PARQUET-251 and PARQUET-297
{code}

Are we leaving a file open that is getting flushed only on shut-down?

Full error when the test runs in the entire suite:

{code}
PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr
org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr using format: (.+) version ((.*) )?\(build ?(.*)\)
	at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
	at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:66)
	at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:264)
	at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:568)
	at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:545)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:455)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412)
	at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:381)
	at org.apache.drill.exec.store.parquet.Metadata.access$0(Metadata.java:379)
	at org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:316)
	at org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:1)
	at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:56)
	at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:122)
	at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:278)
	at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:257)
	at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:242)
	at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:118)
	at org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:733)
	at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:230)
	at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:190)
	at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:169)
	at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:1)
	at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:145)
	at org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:103)
	at org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85)
	at org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:65)
	at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
	at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
	at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:404)
	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:240)
	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:290)
	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:168)
	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:123)
	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:97)
	at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1008)
	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)