You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Arina Ielchiieva (JIRA)" <ji...@apache.org> on 2019/08/02 15:09:00 UTC

[jira] [Commented] (DRILL-5075) Tests complain about Parquet metadata parse errors in Drill-created files

    [ https://issues.apache.org/jira/browse/DRILL-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898968#comment-16898968 ] 

Arina Ielchiieva commented on DRILL-5075:
-----------------------------------------

Since Drill 1.8, Parquet version has been upgraded so I currently such warnings are not observed. Please reopen if seen again.

> Tests complain about Parquet metadata parse errors in Drill-created files
> -------------------------------------------------------------------------
>
>                 Key: DRILL-5075
>                 URL: https://issues.apache.org/jira/browse/DRILL-5075
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> The test {{TestParquetWriter.testAllScalarTypes}} seems to create a Parquet file, then read it using the "new" Parquet reader. However, the test throws the following assertion (though the test still succeeds.)
> Note that the exception does _not_ occur if we run the single test function by itself. It only occurs when run as part of the entire test class, suggesting an interaction between tests.
> When run stand-alone, another behavior occurs. When the test is complete, and the Drillbit shuts down, only then does Parquet log a bunch of "ColumnChunkPageWriteStore: written" messages followed by:
> {code}
> WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by is null or empty! See PARQUET-251 and PARQUET-297
> {code}
> Are we leaving a file open that is getting flushed only on shut-down?
> Full error when the test runs in the entire suite:
> {code}
> PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr
> org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr using format: (.+) version ((.*) )?\(build ?(.*)\)
> 	at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
> 	at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:66)
> 	at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:264)
> 	at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:568)
> 	at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:545)
> 	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:455)
> 	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412)
> 	at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:381)
> 	at org.apache.drill.exec.store.parquet.Metadata.access$0(Metadata.java:379)
> 	at org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:316)
> 	at org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:1)
> 	at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:56)
> 	at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:122)
> 	at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:278)
> 	at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:257)
> 	at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:242)
> 	at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:118)
> 	at org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:733)
> 	at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:230)
> 	at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:190)
> 	at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:169)
> 	at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:1)
> 	at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:145)
> 	at org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:103)
> 	at org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85)
> 	at org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:65)
> 	at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
> 	at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
> 	at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:404)
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:240)
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:290)
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:168)
> 	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:123)
> 	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:97)
> 	at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1008)
> 	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)