You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "rdblue (via GitHub)" <gi...@apache.org> on 2023/03/09 23:44:38 UTC

[GitHub] [iceberg] rdblue opened a new pull request, #7064: Core: Allow customizing FileIO in REST catalog

rdblue opened a new pull request, #7064:
URL: https://github.com/apache/iceberg/pull/7064

   This adds `setFileIOBuilder(Function<Map<String, String>, FileIO)` to the `RESTSessionCatalog`. This allows another way to customize the `FileIO` instances used by the REST catalog. For example, Trino has custom `FileIO` implementations that manage resources differently than `S3FileIO`.
   
   The function accepts a string map of config properties, which allows passing context from the catalog and table level for the IO, including `token` for remote S3 signing as well as the `s3.*` properties for access to S3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7064:
URL: https://github.com/apache/iceberg/pull/7064#discussion_r1132711344


##########
core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java:
##########
@@ -116,6 +122,11 @@ public void initialize(String name, Map<String, String> properties) {
     }
   }
 
+  public void setFileIOBuilder(Function<Map<String, String>, FileIO> ioBuilder) {

Review Comment:
   I looked into adding this to `BaseMetastoreCatalog` but it didn't make sense. That class is not `Configurable` and creating an IO requires a `Configuration` when the subclasses implement it. As a result, this class would need to override both this method and a `newFileIO` method so it isn't worth it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] electrum commented on pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "electrum (via GitHub)" <gi...@apache.org>.
electrum commented on PR #7064:
URL: https://github.com/apache/iceberg/pull/7064#issuecomment-1463193052

   We'll need the same thing for `JdbcCatalog`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7064:
URL: https://github.com/apache/iceberg/pull/7064#discussion_r1132740134


##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -98,6 +98,7 @@ public class RESTSessionCatalog extends BaseSessionCatalog
           OAuth2Properties.SAML1_TOKEN_TYPE);
 
   private final Function<Map<String, String>, RESTClient> clientBuilder;
+  private Function<Map<String, String>, FileIO> ioBuilder;

Review Comment:
   Oops. This was initially final. Good catch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7064:
URL: https://github.com/apache/iceberg/pull/7064#discussion_r1132741345


##########
core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java:
##########
@@ -116,6 +122,11 @@ public void initialize(String name, Map<String, String> properties) {
     }
   }
 
+  public void setFileIOBuilder(Function<Map<String, String>, FileIO> ioBuilder) {

Review Comment:
   I'd like to see whether we need this more broadly first. It's a pretty narrow use case at the moment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on PR #7064:
URL: https://github.com/apache/iceberg/pull/7064#issuecomment-1463001414

   @electrum can you take a look and see if this would meet your needs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "danielcweeks (via GitHub)" <gi...@apache.org>.
danielcweeks commented on code in PR #7064:
URL: https://github.com/apache/iceberg/pull/7064#discussion_r1132734422


##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -98,6 +98,7 @@ public class RESTSessionCatalog extends BaseSessionCatalog
           OAuth2Properties.SAML1_TOKEN_TYPE);
 
   private final Function<Map<String, String>, RESTClient> clientBuilder;
+  private Function<Map<String, String>, FileIO> ioBuilder;

Review Comment:
   nit: we've got a weird mix of explicitly initializing some members to `null` and other's relying on the implicit initialization (e.g. here is implicit and in the `io` member above it's explicit).  I've vote for just not setting the value to null since that's the expected and well understood behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue merged PR #7064:
URL: https://github.com/apache/iceberg/pull/7064


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "danielcweeks (via GitHub)" <gi...@apache.org>.
danielcweeks commented on code in PR #7064:
URL: https://github.com/apache/iceberg/pull/7064#discussion_r1132731306


##########
core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java:
##########
@@ -116,6 +122,11 @@ public void initialize(String name, Map<String, String> properties) {
     }
   }
 
+  public void setFileIOBuilder(Function<Map<String, String>, FileIO> ioBuilder) {

Review Comment:
   Seeing as how we're duplicating this API across some catalogs, should we pull the set/create into an interface?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7064:
URL: https://github.com/apache/iceberg/pull/7064#discussion_r1132722008


##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -762,16 +765,25 @@ private String fullTableName(TableIdentifier ident) {
     return String.format("%s.%s", name(), ident);
   }
 
+  private FileIO newFileIO(Map<String, String> properties) {
+    if (null != ioBuilder) {
+      return ioBuilder.apply(properties);
+    } else {
+      String ioImpl = properties.get(CatalogProperties.FILE_IO_IMPL);
+      return
+          CatalogUtil.loadFileIO(
+              ioImpl != null ? ioImpl : ResolvingFileIO.class.getName(), properties, conf);

Review Comment:
   I updated this to be more like the JDBC catalog, where we use `getOrDefault` on the line above to clean it up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] electrum commented on pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "electrum (via GitHub)" <gi...@apache.org>.
electrum commented on PR #7064:
URL: https://github.com/apache/iceberg/pull/7064#issuecomment-1463192830

   I tested this out and it works for us. The simple integration where we ignore config is good enough to have everything go through `TrinoFileSystem` and avoid the need for `HadoopFileIO`:
   ```java
   icebergCatalogInstance.setFileIOBuilder(config -> new ForwardingFileIo(fileSystemFactory.create(identity)));
   ```
   We can extend this to capture the config for overriding S3 credentials, signer, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7064: Core: Allow customizing FileIO in REST catalog

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7064:
URL: https://github.com/apache/iceberg/pull/7064#discussion_r1132081630


##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -762,16 +765,25 @@ private String fullTableName(TableIdentifier ident) {
     return String.format("%s.%s", name(), ident);
   }
 
+  private FileIO newFileIO(Map<String, String> properties) {
+    if (null != ioBuilder) {
+      return ioBuilder.apply(properties);
+    } else {
+      String ioImpl = properties.get(CatalogProperties.FILE_IO_IMPL);
+      return
+          CatalogUtil.loadFileIO(
+              ioImpl != null ? ioImpl : ResolvingFileIO.class.getName(), properties, conf);

Review Comment:
   nit: maybe worth inlining this, so that we don't need the null check on `ioImpl`:
   ```
   CatalogUtil.loadFileIO(
             properties.getOrDefault(CatalogProperties.FILE_IO_IMPL, ResolvingFileIO.class.getName()),
             properties,
             conf);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org