You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by ppadma <gi...@git.apache.org> on 2016/11/15 01:32:20 UTC

[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

GitHub user ppadma opened a pull request:

    https://github.com/apache/drill/pull/652

    DRILL-4990:Use new HDFS API access instead of listStatus to check if \u2026

    \u2026users have permissions to access workspace.
    
    Manually tested the fix with impersonation enabled. All unit and regression tests pass.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ppadma/drill DRILL-4990

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #652
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/652#discussion_r88072130
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -151,15 +152,11 @@ public WorkspaceSchemaFactory(
       public boolean accessible(final String userName) throws IOException {
         final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem
    -      // has a limited private API (FileSystem.access) to check the permissions directly
    -      // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of
    -      // FileClient. TODO: Update this when DRILL-3749 is fixed.
    -      fs.listStatus(wsPath);
    +      fs.access(wsPath, FsAction.READ);
         } catch (final UnsupportedOperationException e) {
    -      logger.trace("The filesystem for this workspace does not support this operation.", e);
    +      logger.debug("The filesystem for this workspace does not support this operation.", e);
    --- End diff --
    
    We should return false in this case as well. Better to have a local boolean variable (like boolean isAccessAllowed;) and set it accordingly in try and catch. Then have just 1 return statement in the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #652: DRILL-4990:Use new HDFS API access instead of listStatus t...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on the issue:

    https://github.com/apache/drill/pull/652
  
    Based on the last in-person discussion it was decided to further look into why the test was failing on Windows platform. Is it the test environment setup issue or an actual issue w.r.t platform implementation of access method ? 


---

[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/652#discussion_r88097089
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -151,15 +152,11 @@ public WorkspaceSchemaFactory(
       public boolean accessible(final String userName) throws IOException {
         final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem
    -      // has a limited private API (FileSystem.access) to check the permissions directly
    -      // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of
    -      // FileClient. TODO: Update this when DRILL-3749 is fixed.
    -      fs.listStatus(wsPath);
    +      fs.access(wsPath, FsAction.READ);
         } catch (final UnsupportedOperationException e) {
    -      logger.trace("The filesystem for this workspace does not support this operation.", e);
    +      logger.debug("The filesystem for this workspace does not support this operation.", e);
    --- End diff --
    
    I am not returning false to be consistent with the existing behavior.  If the user does not have permission, we fail when we try to read the file during execution. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #652: DRILL-4990:Use new HDFS API access instead of listStatus t...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/652
  
    This pull request was never merged because of a problem with windows test setup we have.  As a workaround, I added code to fall back to using old API if new API fails for some reason. All tests are passing fine with this change. 
    This is nice to include in 1.12 as it provides performance improvement for all DFS based queries especially when there are large number of files. 
    Can we review the new diffs please ?


---

[GitHub] drill issue #652: DRILL-4990:Use new HDFS API access instead of listStatus t...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/652
  
    @sohami Sorabh, since you reviewed the original pull request, can you please review the updated diffs ?


---

[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/652#discussion_r88072322
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -151,15 +152,11 @@ public WorkspaceSchemaFactory(
       public boolean accessible(final String userName) throws IOException {
         final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem
    --- End diff --
    
    Also can we please replace the comments with why we are using "fs.access" api instead. Will be good for future use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #652: DRILL-4990:Use new HDFS API access instead of listStatus t...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on the issue:

    https://github.com/apache/drill/pull/652
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/652#discussion_r148991415
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -151,17 +152,32 @@ public WorkspaceSchemaFactory(
        */
       public boolean accessible(final String userName) throws IOException {
         final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    boolean tryListStatus = false;
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem
    -      // has a limited private API (FileSystem.access) to check the permissions directly
    -      // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of
    -      // FileClient. TODO: Update this when DRILL-3749 is fixed.
    -      fs.listStatus(wsPath);
    +      // access API checks if a user has certain permissions on a file or directory.
    +      // returns normally if requested permissions are granted and throws an exception
    +      // if access is denied. This API was added in HDFS 2.6 (see HDFS-6570).
    +      // It is less expensive (than listStatus which was being used before) and hides the
    +      // complicated access control logic underneath.
    +      fs.access(wsPath, FsAction.READ);
         } catch (final UnsupportedOperationException e) {
    -      logger.trace("The filesystem for this workspace does not support this operation.", e);
    +      logger.debug("The filesystem for this workspace does not support access operation.", e);
    +      tryListStatus = true;
         } catch (final FileNotFoundException | AccessControlException e) {
    -      return false;
    +      logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e);
    +      tryListStatus = true;
    --- End diff --
    
    Is access function never trustworthy for negative cases ? Or is it windows platform specific issue ?


---

[GitHub] drill issue #652: DRILL-4990:Use new HDFS API access instead of listStatus t...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/652
  
    I did not add new unit tests because existing tests already provide enough coverage and they run on local file system. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/652#discussion_r148991210
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -151,17 +152,32 @@ public WorkspaceSchemaFactory(
        */
       public boolean accessible(final String userName) throws IOException {
         final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    boolean tryListStatus = false;
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem
    -      // has a limited private API (FileSystem.access) to check the permissions directly
    -      // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of
    -      // FileClient. TODO: Update this when DRILL-3749 is fixed.
    -      fs.listStatus(wsPath);
    +      // access API checks if a user has certain permissions on a file or directory.
    +      // returns normally if requested permissions are granted and throws an exception
    +      // if access is denied. This API was added in HDFS 2.6 (see HDFS-6570).
    +      // It is less expensive (than listStatus which was being used before) and hides the
    +      // complicated access control logic underneath.
    +      fs.access(wsPath, FsAction.READ);
         } catch (final UnsupportedOperationException e) {
    -      logger.trace("The filesystem for this workspace does not support this operation.", e);
    +      logger.debug("The filesystem for this workspace does not support access operation.", e);
    +      tryListStatus = true;
         } catch (final FileNotFoundException | AccessControlException e) {
    -      return false;
    +      logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e);
    +      tryListStatus = true;
    +    }
    +
    +    // if fs.access fails for some reason, fall back to listStatus.
    +    if (tryListStatus) {
    +      try {
    +        fs.listStatus(wsPath);
    +      } catch (final UnsupportedOperationException e) {
    +        logger.debug("The filesystem for this workspace does not support listStatus operation.", e);
    +      } catch (final FileNotFoundException | AccessControlException e) {
    +        logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e);
    --- End diff --
    
    let's add `using listStatus` in the log statement to differentiate with previous case


---

[GitHub] drill issue #652: DRILL-4990:Use new HDFS API access instead of listStatus t...

Posted by kkhatua <gi...@git.apache.org>.
Github user kkhatua commented on the issue:

    https://github.com/apache/drill/pull/652
  
    @ppadma can you rebase this with the current master, and include the workaround? It does not make sense to hold up this commit for so long if a workaround for the Windows platform is sufficient.


---

[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/652#discussion_r149173203
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -151,17 +152,32 @@ public WorkspaceSchemaFactory(
        */
       public boolean accessible(final String userName) throws IOException {
         final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    boolean tryListStatus = false;
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem
    -      // has a limited private API (FileSystem.access) to check the permissions directly
    -      // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of
    -      // FileClient. TODO: Update this when DRILL-3749 is fixed.
    -      fs.listStatus(wsPath);
    +      // access API checks if a user has certain permissions on a file or directory.
    +      // returns normally if requested permissions are granted and throws an exception
    +      // if access is denied. This API was added in HDFS 2.6 (see HDFS-6570).
    +      // It is less expensive (than listStatus which was being used before) and hides the
    +      // complicated access control logic underneath.
    +      fs.access(wsPath, FsAction.READ);
         } catch (final UnsupportedOperationException e) {
    -      logger.trace("The filesystem for this workspace does not support this operation.", e);
    +      logger.debug("The filesystem for this workspace does not support access operation.", e);
    +      tryListStatus = true;
         } catch (final FileNotFoundException | AccessControlException e) {
    -      return false;
    +      logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e);
    +      tryListStatus = true;
    --- End diff --
    
    It can be trusted fine for regular file systems HDFS, MapR FS, Mac OS Extended File System etc.  I am not sure if it works for windows or not. We were never able to figure out whether it is a problem with the test setup or it is actually not supported. 
    
    Updated the diffs with your review comments taken care of. Please take a look.



---