You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Timothy Farkas (JIRA)" <ji...@apache.org> on 2018/08/22 23:58:00 UTC

[jira] [Updated] (DRILL-6609) Investigate Creation of FileSystem Configuration for Hive Parquet Files

     [ https://issues.apache.org/jira/browse/DRILL-6609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timothy Farkas updated DRILL-6609:
----------------------------------
    Description: 
Currently when reading a parquet file in Hive we try to speed things up by doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When retrieving the FileSystem Configuration to use in HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined for the HiveStoragePlugin. This could cause a misconfiguration in the HiveStoragePlugin to influence the configuration of our FileSystem.

Currently it is unclear if this was desired behavior or not. If it is desired we need to document why it was done. If it is not desired we need to fix the issue.

This may be the root cause of the issue discovered by chun

To reproduce the issue: 1) two or more nodes cluster; 2) enable impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 6) ctas from a large enough hive table as source to recreate the table/file; 7) query the table from node A should work; 8) query from node B as same user should reproduce the issue.

  was:
Currently when reading a parquet file in Hive we try to speed things up by doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When retrieving the FileSystem Configuration to use in HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined for the HiveStoragePlugin. This could cause a misconfiguration in the HiveStoragePlugin to influence the configuration of our FileSystem.

Currently it is unclear if this was desired behavior or not. If it is desired we need to document why it was done. If it is not desired we need to fix the issue.


> Investigate Creation of FileSystem Configuration for Hive Parquet Files
> -----------------------------------------------------------------------
>
>                 Key: DRILL-6609
>                 URL: https://issues.apache.org/jira/browse/DRILL-6609
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Timothy Farkas
>            Priority: Major
>
> Currently when reading a parquet file in Hive we try to speed things up by doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When retrieving the FileSystem Configuration to use in HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined for the HiveStoragePlugin. This could cause a misconfiguration in the HiveStoragePlugin to influence the configuration of our FileSystem.
> Currently it is unclear if this was desired behavior or not. If it is desired we need to document why it was done. If it is not desired we need to fix the issue.
> This may be the root cause of the issue discovered by chun
> To reproduce the issue: 1) two or more nodes cluster; 2) enable impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 6) ctas from a large enough hive table as source to recreate the table/file; 7) query the table from node A should work; 8) query from node B as same user should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)