You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/10/28 04:13:00 UTC

[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

    [ https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223220#comment-16223220 ] 

ASF GitHub Bot commented on DRILL-4990:
---------------------------------------

Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/652
  
    This pull request was never merged because of a problem with windows test setup we have.  As a workaround, I added code to fall back to using old API if new API fails for some reason. All tests are passing fine with this change. 
    This is nice to include in 1.12 as it provides performance improvement for all DFS based queries especially when there are large number of files. 
    Can we review the new diffs please ?


> Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4990
>                 URL: https://issues.apache.org/jira/browse/DRILL-4990
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.8.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>
> For every query, we build the schema tree (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all storage plugins are checked and are added to the schema tree if they are accessible by the user who initiated the query.  For file system plugin, listStatus API is used to check if  the workspace is accessible or not (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the user does not have access to file(s) in the workspace, listStatus will generate an exception and we return false. But, listStatus (which lists all the entries of a directory) is an expensive operation when there are large number of files in the directory. A new API is added in Hadoop 2.6 called access (HDFS-6570) which provides the ability to check if the user has permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)