You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/08/09 14:09:00 UTC

[jira] [Commented] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException

    [ https://issues.apache.org/jira/browse/DRILL-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903921#comment-16903921 ] 

ASF GitHub Bot commented on DRILL-4517:
---------------------------------------

arina-ielchiieva commented on pull request #1839: DRILL-4517: Support reading empty Parquet files
URL: https://github.com/apache/drill/pull/1839
 
 
   1. Modified flat and complex parquet readers to output schema only when requested number of records to read is 0. In this case readers are not initialized to improve performance.
   2. Allowed reading requested number of rows instead of all rows in the row group (DRILL-6528).
   3. Fixed issue with nulls number determination in the row group (fixed IsPredicate#isAllNulls method).
   4. Allowed reading empty parquet files via adding empty / fake row group.
   5. General refactoring and unit tests.
   
   Jira - [DRILL-4517](https://issues.apache.org/jira/browse/DRILL-4517).
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Reading emtpy Parquet file failes with java.lang.IllegalArgumentException
> -------------------------------------------------------------------------
>
>                 Key: DRILL-4517
>                 URL: https://issues.apache.org/jira/browse/DRILL-4517
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components:  Server
>            Reporter: Tobias
>            Assignee: Arina Ielchiieva
>            Priority: Major
>              Labels: doc-impacting
>             Fix For: 1.17.0
>
>         Attachments: empty.parquet, no_rows.parquet
>
>
> When querying a Parquet file that has a schema but no rows the Drill Server will fail with the below
> This looks similar to DRILL-3557
> {noformat}
> {{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT {
>   required int64 MEMBER_ACCOUNT_ID;
>   required int64 TIMESTAMP_IN_HOUR;
>   optional int64 APPLICATION_ID;
> }
> , metadata: {}}}, blocks: []}
> {noformat}
> {noformat}
> Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read entries assigned
>         at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) ~[guava-14.0.1.jar:na]
>         at org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.physical.config.Project.accept(Project.java:51) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134) ~[drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) [drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) [drill-java-exec-1.5.0.jar:1.5.0]
>         at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) [drill-java-exec-1.5.0.jar:1.5.0]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)