You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/23 15:33:13 UTC

[GitHub] [iceberg] cccs-jc opened a new pull request #1979: Support for role based access of HadoopCatalog table listing #1941

cccs-jc opened a new pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979


   Modified the implementation of the HadoopCatalog to avoid using the deprecated isDirectory method.
   
   Instead used the getFileStatus() method to get both the fact that a path exists and is a directory in a single call.
   
   Created two methods to determine isTableDir() and isDirectory() which catch any I/O exception which are thrown when a user/client does not have access to certain folders. 
   
   http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#isDirectory-org.apache.hadoop.fs.Path-
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-759101873


   Thanks, @cccs-jc! The tests are passing after a restart, so I'll merge this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r550685284



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +139,25 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (IOException e) {
+      return false;

Review comment:
       Should the boolean be configurable? How are you setting it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-755500079


   sounds good. I will narrow it to the Azure package and also narrow it to when it is listing a table or namespace only.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] steveloughran commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
steveloughran commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-763609463


   FWIW, [HADOOP-15710](https://issues.apache.org/jira/browse/HADOOP-15710) first reported this issue in 2018. Someone needs to fix this. Will escalate


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r549734182



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +139,25 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (IOException e) {
+      return false;

Review comment:
       We use the iceberg HadoopCatalog over Azure ABFS, Users use spark to list the tables in the catalog. However not every user has access to all tables on disk (we use file system access control list). When a user cannot access a folder the listStatus function will throw a subclass of IOException.
   
   In my ticket I suggested using a flag to configure the HadoopCatalog to throw errors when unable to list these directories or return false if it is expected that some users will not be able to list all folders/tables.
   
   [https://github.com/apache/iceberg/issues/1941](https://github.com/apache/iceberg/issues/1941)
   
   The listStatus throws an FileNotFoundException when the path provided does not exists. I will handle that as a "normal" condition, any other IOException I will either catch or not based on a configuration flag.  This should handle scenarios where not all user's can access all tables/namespaces
   
   What do you think about that proposal?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc edited a comment on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc edited a comment on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-755543018


   I've checked everywhere the following methods are used
   
   isTableDir
   isDirectory
   isNamespace
   
   In all cases I can see it makes sense to not throw the exception since these are tests. For example
   ```
     createNamespace()
       if isNamespace
         throw AlreadyExistsException
   	  
       if it does not exists it will try to create it.
       if a user for some reason can't list the dir it will return false and
       proceed to try to create it, at that point it will fail with a IOException
       showing why (could be restricted access or any other reason)
       
     
     listNamespaces()
       if not isNamespace
         throws NoSuchNamespaceException
     
       then proceeds to list the namespaces under that location, so like listTables it might be a partial list
       
     dropNamespace()
       if can't access it will consider it does not exists and return false (not dropped)
     
     
     loadNamespaceMetadata()
       again if can't access namespace a NoSuchNamespaceException is returned
     ```
     
   Nonetheless I will add checks for determining if the exception is an azure blob exception. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r553696411



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -158,6 +167,44 @@ public String name() {
     return catalogName;
   }
 
+  private boolean shouldSuppressIOException(IOException ioException) {
+    String name = ioException.getClass().getName();
+    return suppressIOException && name.startsWith("org.apache.hadoop.fs.azurebfs");

Review comment:
       That sounds good to me!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r553052355



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -129,6 +136,8 @@ public void initialize(String name, Map<String, String> properties) {
     this.fs = Util.getFs(new Path(warehouseLocation), conf);
 
     String fileIOImpl = properties.get(CatalogProperties.FILE_IO_IMPL);
+    this.suppressIOException = Boolean.parseBoolean(properties.get(CatalogProperties.HADOOP_SUPPRESS_IOEXCEPTION));
+

Review comment:
       Can you move this to after `fileIO` is set? The line above and that line are related so we shouldn't separate them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r553052634



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -158,6 +167,44 @@ public String name() {
     return catalogName;
   }
 
+  private boolean shouldSuppressIOException(IOException ioException) {
+    String name = ioException.getClass().getName();
+    return suppressIOException && name.startsWith("org.apache.hadoop.fs.azurebfs");
+  }
+
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (FileNotFoundException e) {
+      return false;
+    } catch (IOException e) {
+      if (shouldSuppressIOException(e)) {
+        LOG.warn("Unalbe to metadata directory {}: {}", metadataPath, e);

Review comment:
       Typo: Unalbe -> Unable

##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -158,6 +167,44 @@ public String name() {
     return catalogName;
   }
 
+  private boolean shouldSuppressIOException(IOException ioException) {
+    String name = ioException.getClass().getName();
+    return suppressIOException && name.startsWith("org.apache.hadoop.fs.azurebfs");
+  }
+
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (FileNotFoundException e) {
+      return false;
+    } catch (IOException e) {
+      if (shouldSuppressIOException(e)) {
+        LOG.warn("Unalbe to metadata directory {}: {}", metadataPath, e);
+        return false;
+      } else {
+        throw new UncheckedIOException(e);
+      }
+    }
+  }
+
+  private boolean isDirectory(Path path) {
+    try {
+      return fs.getFileStatus(path).isDirectory();
+    } catch (FileNotFoundException e) {
+      return false;
+    } catch (IOException e) {
+      if (shouldSuppressIOException(e)) {
+        LOG.warn("Unalbe to list directory {}: {}", path, e);

Review comment:
       Same typo.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r553052121



##########
File path: core/src/main/java/org/apache/iceberg/CatalogProperties.java
##########
@@ -34,6 +34,8 @@ private CatalogProperties() {
   public static final String HIVE_CLIENT_POOL_SIZE = "clients";
   public static final int HIVE_CLIENT_POOL_SIZE_DEFAULT = 2;
 
+  public static final String HADOOP_SUPPRESS_IOEXCEPTION = "suppress-io-exception";

Review comment:
       Since this is specific to Hadoop, let's move it into `HadoopCatalog`. I don't think the other catalogs would need this because they do not list using FS operations. A private constant in `HadoopCatalog` should be fine and we can always expose it later if we need to.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-754936363


   Logging the error is definitely a good idea.
   
   It sounds like the main problem is that there isn't a specific error to catch. I think it would be fine if any directory that the user doesn't have access to doesn't show up in the list results. But we do need to ensure that ignoring directories isn't applied to other situations where it isn't reasonable. So if we can inspect the exception from Azure and make this work I think that's the way to go. We can support other errors later as we find them.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r551472785



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +142,37 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (FileNotFoundException e) {
+      return false;
+    } catch (IOException e) {
+      if (suppressACLExcpetion) {

Review comment:
       fair enough.
   
   I'm going to add this configuration like it is done for the FILE_IO_IMPL for example.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-754814225


   I'm still concerned about adding this without a better check. Adding a flag to turn all `IOException` instances into a `false` response seems really brittle. Can we inspect the exception message? Is there a cause that is available? Maybe we can at least check the package of the exception instance?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r549798575



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +139,25 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (IOException e) {
+      return false;

Review comment:
       I think it makes sense to catch `FileNotFoundException` and return false. What is the exception class that you get when a user can't list path? I would rather use that instead of suppressing any `IOException` if there is a flag set.
   
   It is also unclear how that suppression is set.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-755543018


   I've checked everywhere the following methods are used
   
   isTableDir
   isDirectory
   isNamespace
   
   In all cases I can see it makes sense to not throw the exception since these are tests. For example
   
   createNamespace()
     if isNamespace
       throw AlreadyExistsException
   	
     if it does not exists it will try to create it.
     if a user for some reason can't list the dir it will return false and proceed to try to create it, at that point it will fail with a IOException showing why (could be restricted access or any other reason)
     
   
   listNamespaces()
     if not isNamespace
       throws NoSuchNamespaceException
   
     then proceeds to list the namespaces under that location, so like listTables it might be a partial list
     
   dropNamespace()
     if can't access it will consider it does not exists and return false (not dropped)
   
   
   loadNamespaceMetadata
     again if can't access namespace a NoSuchNamespaceException is returned
     
     
   Nonetheless I will add checks for determining if the exception is an azure blob exception. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r549734182



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +139,25 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (IOException e) {
+      return false;

Review comment:
       We use the iceberg HadoopCatalog over Azure ABFS, Users use spark to list the tables in the catalog. However not every user has access to all tables on disk (we use file system access control list). When a user cannot access a folder the listStatus function will throw an IOException.
   
   In my ticket I suggested using a flag to configure the HadoopCatalog to throw errors when unable to list these directories or return false if it is expected that some users will not be able to list all folders/tables.
   
   What do you think about that proposal?
   
   [https://github.com/apache/iceberg/issues/1941](https://github.com/apache/iceberg/issues/1941)
     




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r553696599



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/ConfigProperties.java
##########
@@ -25,4 +25,5 @@ private ConfigProperties() {
   }
 
   public static final String ENGINE_HIVE_ENABLED = "iceberg.engine.hive.enabled";
+

Review comment:
       Nit: this file doesn't need to be touched. Could you remove this whitespace change? It may cause git conflicts.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r551065211



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +139,25 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (IOException e) {
+      return false;

Review comment:
       Hey Ryan, sorry for the late reply. Ya I'm thinking it could support a flag passed in as a configuration via the org.apache.hadoop.conf.Configuration
   
   I'm just not sure how to pass values into that configuration object. For example there is already a FILE_IO_IMPL supported. How would I add an additional configuration like FILE_IO_IMPL?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-756904527


   Hey Ryan,
   
   Made the change. Should be good to go now. Please have a final spot check.
   Cheers


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r549468068



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +139,25 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (IOException e) {
+      return false;

Review comment:
       Same here. Returning `false` is not correct.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r549467907



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +139,25 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (IOException e) {
+      return false;
+    }
+  }
+
+  private boolean isDirectory(Path path) {
+    try {
+      return fs.getFileStatus(path).isDirectory();
+    } catch (IOException e) {
+      return false;

Review comment:
       This should throw either `RuntimeIOException` or `UncheckedIOException`. If the status can't be loaded, returning false is not correct.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-754869962


   sure I could check if it's an azure blob storage exception but that would make it specific to azure..
   
   This is the outline of the listTables method. In the end if a location can't be read it will return NoSuchNamespaceException. If for any reason you cannot list the dir/metadata folder it won't return the table.
   
   So the fact that IOException are not return will not break anything. Worst case users don't see the namespace/tables and hunt down why that is. Maybe logging the error would be helpful while still returning false. My goal here is to not interrupt the execution of listTables if certain user's can't access certain tables. So having a log entry might be useful for the admins/operators.
   
       tables listTables(location)
         if !isDirectory(location) (tries to list "location", if any exception return false)
   	    throw NoSuchNamespaceException
   
           for dir in location
             if dir.isFolder()
               if isTableDir(dir) (tries to list "dir/metadata", if any exception return false)
                 add to list of tables to return
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] steveloughran commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
steveloughran commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r560909406



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +139,25 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (IOException e) {
+      return false;

Review comment:
       this is really brittle design. It will only work for one specific store and contains the assumption that the store always returns the string "AuthorizationPermissionMismatch" on an error. If the abfs store *ever* changed that string everything will break.
   
   * That abfs exception also includes a status code, 403, which is what should really be used as the signal.
   * Other stores return AccessDeniedException. Has anyone considered filing a hadoop bug asking for translation of rest exceptions with 403s into those? I am reasonably confident nobody is going reject PRs there.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
massdosage commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r551425911



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -80,6 +82,7 @@
   private final String warehouseLocation;
   private final FileSystem fs;
   private final FileIO fileIO;
+  private boolean suppressACLExcpetion = false;

Review comment:
       ```suggestion
     private boolean suppressACLException = false;
   ```

##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +142,37 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (FileNotFoundException e) {
+      return false;
+    } catch (IOException e) {
+      if (suppressACLExcpetion) {

Review comment:
       These exceptions could be thrown be errors that don't have anything to do with ACL so can we try come up with a more generic name for this flag? `suppressIOException` maybe? I guess it depends on where they will be configured and how.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r553053591



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +139,25 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (IOException e) {
+      return false;

Review comment:
       Getting the config from catalog properties looks good, but I'd move the property constant as I noted above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r553052558



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -158,6 +167,44 @@ public String name() {
     return catalogName;
   }
 
+  private boolean shouldSuppressIOException(IOException ioException) {
+    String name = ioException.getClass().getName();
+    return suppressIOException && name.startsWith("org.apache.hadoop.fs.azurebfs");

Review comment:
       I would really prefer if this were more specific and looked at the exception message as well to ensure the error is a permissions or ACL problem.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-756904527


   Hey Ryan,
   
   Made the change. Should be good to go now. Please have a final spot check.
   Cheers


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r550204481



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -139,6 +139,25 @@ public String name() {
     return catalogName;
   }
 
+  private boolean isTableDir(Path path) {
+    Path metadataPath = new Path(path, "metadata");
+    // Only the path which contains metadata is the path for table, otherwise it could be
+    // still a namespace.
+    try {
+      return fs.listStatus(metadataPath, TABLE_FILTER).length >= 1;
+    } catch (IOException e) {
+      return false;

Review comment:
       I've added support for a flag. However I have not yet retrieved the value from the configuration properties.
   
   The exception thrown is this one:
   
   https://hadoop.apache.org/docs/r3.2.0/api/org/apache/hadoop/fs/azurebfs/contracts/exceptions/AbfsRestOperationException.html
   
   IOException
       AzureBlobFileSystemException
           AbfsRestOperationException
   
   Unfortunately the exception thrown is an ABFS specific class.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-756497634


   @cccs-jc, thanks for updating this! Other than the file that doesn't need to be modified and renaming the config property, I think this is ready to go. Thanks for getting this done!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on a change in pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on a change in pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#discussion_r553469714



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -158,6 +167,44 @@ public String name() {
     return catalogName;
   }
 
+  private boolean shouldSuppressIOException(IOException ioException) {
+    String name = ioException.getClass().getName();
+    return suppressIOException && name.startsWith("org.apache.hadoop.fs.azurebfs");

Review comment:
       I have fixed the code review suggestions.
   
   I'm detecting the azure blob permission exception using the message (that's all I have available). In that message there is an AzureServiceErrorCode of value "AuthorizationPermissionMismatch" which I use to detect a permission issue.
   
   I agree this is much better error handling since it will let other errors be reported.
   
   I'm thinking the flag should thus be "suppress permission errors". What do you think? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1979: Support for role based access of HadoopCatalog table listing #1941

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1979:
URL: https://github.com/apache/iceberg/pull/1979#issuecomment-751844677


   Thanks for working on this, @cccs-jc. I like that it avoids extra calls to the NameNode.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org