You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Sorabh Hamirwasia (JIRA)" <ji...@apache.org> on 2018/01/21 20:19:02 UTC

[jira] [Comment Edited] (DRILL-6100) Intermittent failure while reading Parquet file footer during planning phase

    [ https://issues.apache.org/jira/browse/DRILL-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333381#comment-16333381 ] 

Sorabh Hamirwasia edited comment on DRILL-6100 at 1/21/18 8:18 PM:
-------------------------------------------------------------------

It's seen during investigation that Parquet data is sometimes accessed using Query User instead of ProcessUser and hence there is AccessControlException thrown from FileSystem while trying to open the file. During planning time Drill tries to create the Parquet metadata cache by reading the footer of data file. ReadFooter api in Parquet library get's the filesystem instance from a Cache based on the URI of the file path. When it looks into the cache it creates a key which get's current user by UserGroupInformation.getCurrentUser() and this call returns different user at different times (queryUser, viewOwner or ProcessUser). To make sure this call always happen in context of process user Drill should call [readFooter api|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L421] in doAs block of process user UGI during planning time at least. Here using process user is fine since metadata file is created as process user as well and is used by Drill across the queries. A separate JIRA needs to be created to evaluate all the SCAN operator's as well, just to double confirm if data is read by correct file system instance or not.


was (Author: shamirwasia):
It's seen during investigation that Parquet data is sometimes accessed using Query User instead of View Owner and hence there is AccessControlException thrown from FileSystem while trying to open the file. During planning time Drill tries to create the Parquet metadata cache by reading the footer of data file. ReadFooter api in Parquet library get's the filesystem instance from a Cache based on the URI of the file path. When it looks into the cache it creates a key which get's current user by UserGroupInformation.getCurrentUser() and this call returns different user at different times (queryUser, viewOwner or ProcessUser). To make sure this call always return view owner Drill should call [readFooter api|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L421] in doAs block of ViewOwner UGI.

> Intermittent failure while reading Parquet file footer during planning phase
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-6100
>                 URL: https://issues.apache.org/jira/browse/DRILL-6100
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.10.0
>            Reporter: Sorabh Hamirwasia
>            Assignee: Sorabh Hamirwasia
>            Priority: Major
>             Fix For: 1.13.0
>
>
> When running queries from multiple users for a view which then refers to a parquet data file, intermittent failure is seen during planning phase. The failure happens when the Parquet data file which view owner has access to is read to create metadata cache. Query user doesn't have direct access to the Parquet data file but has read access to the view which in turn is accessing the actual data. When Parquet Metadata file is created it's created as ProcessUser based on DRILL-4143 but footer is not read under the process user context. While running concurrent queries from several client sporadic failures was observed since at times footer was being read as Query User which doesn't have access to the file.
>  
> {code:java}
> 2018-01-12 13:19:57,267
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failure creating scan.
>  at org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:92) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:70) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:63) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.planner.logical.DrillScanRule.onMatch(DrillScanRule.java:37) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
>  ... 15 common frames omitted
>  Caused by: org.apache.hadoop.security.AccessControlException: Open failed for file: /env/test/data/final
>  at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:265) ~[maprfs-5.2.2-mapr.jar:5.2.2-mapr]
>  at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:938) ~[maprfs-5.2.2-mapr.jar:5.2.2-mapr]
>  at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:803) ~[hadoop-common-2.7.0-mapr-1607.jar:na]
>  at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:425) ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
>  at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412) ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
>  at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:395) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.Metadata.access$000(Metadata.java:85) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:323) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:311) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:56) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:122) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:285) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:264) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:249) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:121) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:733) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:230) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:190) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:169) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:67) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:146) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:100) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  at org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:90) ~[drill-java-exec-1.10.0.jar:1.10.0]
>  ... 19 common frames omitted
>  org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AccessControlException: Open failed for file: /env/test/data/final/snapshot_period_id=1234567/000000_0, error: Permission denied (13)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)