You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/05/26 01:32:00 UTC
[jira] [Commented] (DRILL-7934) NullPointerException error when reading parquet files

    [ https://issues.apache.org/jira/browse/DRILL-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351422#comment-17351422 ] 

ASF GitHub Bot commented on DRILL-7934:
---------------------------------------

cdmikechen opened a new pull request #2238:
URL: https://github.com/apache/drill/pull/2238


   # [DRILL-7934](https://issues.apache.org/jira/browse/DRILL-7934): NullPointerException error when reading parquet files
   
   ## Description
   
   Change null to `org.apache.drill.common.types.Types.NULL` to avoid NullPointerException error.
   
   ## Documentation
   It is a bug fix, will not change drill documentation.
   
   ## Testing
   Have tested in a standalone drill cluster
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> NullPointerException error when reading parquet files
> -----------------------------------------------------
>
>                 Key: DRILL-7934
>                 URL: https://issues.apache.org/jira/browse/DRILL-7934
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.18.0
>         Environment: Drill 1.18 
> Ambari 2.7.4
> Spark 3.0.2
>            Reporter: cdmikechen
>            Priority: Critical
>             Fix For: 1.19.0
>
>
> I create a dataset using spark ml, when I use drill 1.18 to query this dataset folder, it report error this:
> {code}
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Error while applying rule DrillScanRule, args [rel#29:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hdfs_dataset.default, /home/spark/dataset/default/test2/*.parquet])]
> 	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:301)
> 	... 3 common frames omitted
> Caused by: java.lang.RuntimeException: Error while applying rule DrillScanRule, args [rel#29:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hdfs_dataset.default, /home/spark/dataset/default/test2/*.parquet])]
> 	at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:235)
> 	at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:633)
> 	at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:327)
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:405)
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:351)
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:245)
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:308)
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:173)
> 	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
> 	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
> 	at org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:140)
> 	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
> 	at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:593)
> 	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
> 	... 3 common frames omitted
> Caused by: java.lang.NullPointerException: null
> 	at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.lambda$getPartitionsMetadata$7(BaseParquetMetadataProvider.java:354)
> 	at java.util.Map.forEach(Map.java:630)
> 	at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getPartitionsMetadata(BaseParquetMetadataProvider.java:342)
> 	at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.initializeMetadata(BaseParquetMetadataProvider.java:206)
> 	at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.init(BaseParquetMetadataProvider.java:170)
> 	at org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>(ParquetTableMetadataProviderImpl.java:95)
> 	at org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>(ParquetTableMetadataProviderImpl.java:48)
> 	at org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl$Builder.build(ParquetTableMetadataProviderImpl.java:415)
> 	at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:150)
> 	at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:120)
> 	at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:202)
> 	at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:79)
> 	at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:226)
> 	at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:209)
> 	at org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:119)
> 	at org.apache.drill.exec.planner.common.DrillScanRelBase.<init>(DrillScanRelBase.java:51)
> 	at org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:76)
> 	at org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:65)
> 	at org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:58)
> 	at org.apache.drill.exec.planner.logical.DrillScanRule.onMatch(DrillScanRule.java:38)
> 	at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208)
> 	... 16 common frames omitted
> {code}
> It is same like issue https://issues.apache.org/jira/browse/DRILL-7769.
> I add some log information and found this:
> {code}
> TRACE o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `features`.`values`.`list`.`element` with major type null
>  current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, `features`.`type`=minor_type: TINYINT
> mode: REQUIRED
> , `features`.`size`=minor_type: INT
> mode: OPTIONAL
> }
> 2021-05-25 15:39:21,066 [1f535658-f840-9f0e-1a7b-21080514bb7b:foreman] TRACE o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `label` with major type minor_type: FLOAT8
> mode: REQUIRED
>  current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, `features`.`type`=minor_type: TINYINT
> mode: REQUIRED
> , `features`.`size`=minor_type: INT
> mode: OPTIONAL
> }
> 2021-05-25 15:39:21,066 [1f535658-f840-9f0e-1a7b-21080514bb7b:foreman] TRACE o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `features`.`size` with major type minor_type: INT
> mode: OPTIONAL
>  current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, `features`.`type`=minor_type: TINYINT
> mode: REQUIRED
> , `features`.`size`=minor_type: INT
> mode: OPTIONAL
> }
> {code}
> So that there is some condition major type is null, if drill use this code, it will catch NullPointerException error:
> {code:java}
> TypeProtos.MajorType majorType = columnMetadata != null ? columnMetadata.majorType() : null; # 121
> !partitionColTypeMap.get(schemaPath).equals(type) # 189
> {code}
> we need to change null to *org.apache.drill.common.types.Types.NULL* to avoid NullPointerException error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)