You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/17 11:12:08 UTC

[GitHub] [iceberg] snazy opened a new pull request #2345: Nessie: assert that `NessieCatalog` is initialized

snazy opened a new pull request #2345:
URL: https://github.com/apache/iceberg/pull/2345


   Tell users (when not using `CatalogUtil`) that the `NessieCatalog` needs to be initialized before it can be used. Also cleans up (closes) the `NessieClient` when initialization fails, plus a few more verbose NPE-checks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] snazy closed pull request #2345: Nessie: assert that `NessieCatalog` is initialized

Posted by GitBox <gi...@apache.org>.
snazy closed pull request #2345:
URL: https://github.com/apache/iceberg/pull/2345


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2345: Nessie: assert that `NessieCatalog` is initialized

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2345:
URL: https://github.com/apache/iceberg/pull/2345#discussion_r597299531



##########
File path: nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java
##########
@@ -78,31 +78,63 @@
   private UpdateableReference reference;
   private String name;
   private FileIO fileIO;
+  private boolean initialized;
 
   public NessieCatalog() {
   }
 
   @Override
   public void initialize(String inputName, Map<String, String> options) {
-    String fileIOImpl = options.get(CatalogProperties.FILE_IO_IMPL);
-    this.fileIO = fileIOImpl == null ? new HadoopFileIO(config) : CatalogUtil.loadFileIO(fileIOImpl, options, config);
-    this.name = inputName == null ? "nessie" : inputName;
-    // remove nessie prefix
-    final Function<String, String> removePrefix = x -> x.replace("nessie.", "");
+    if (initialized) {
+      close();
+    }
+    if (config == null) {
+      throw new IllegalStateException(String.format("setConf() must be called before initialize() for NessieCatalog '%s'", name));
+    }
+
+    try {
+      String fileIOImpl = options.get(CatalogProperties.FILE_IO_IMPL);
+      this.fileIO = fileIOImpl == null ? new HadoopFileIO(config) : CatalogUtil.loadFileIO(fileIOImpl, options, config);
+      this.name = inputName == null ? "nessie" : inputName;
+      // remove nessie prefix
+      final Function<String, String> removePrefix = x -> x.replace("nessie.", "");
 
-    this.client = NessieClient.builder().fromConfig(x -> options.get(removePrefix.apply(x))).build();
+      this.client = NessieClient.builder().fromConfig(x -> options.get(removePrefix.apply(x))).build();
 
-    this.warehouseLocation = options.get(CatalogProperties.WAREHOUSE_LOCATION);
-    if (warehouseLocation == null) {
-      throw new IllegalStateException("Parameter warehouse not set, nessie can't store data.");
+      this.warehouseLocation = options.get(CatalogProperties.WAREHOUSE_LOCATION);
+      if (warehouseLocation == null) {
+        throw new IllegalStateException("Parameter warehouse not set, nessie can't store data.");
+      }
+      final String requestedRef = options.get(removePrefix.apply(NessieConfigConstants.CONF_NESSIE_REF));
+      this.reference = loadReference(requestedRef);
+
+      this.initialized = true;
+    } catch (RuntimeException e) {
+      close();
+      throw new RuntimeException("Failed to initialize NessieCatalog with options " + options);
+    }
+  }
+
+  private void assertInitialized() {

Review comment:
       We could add static `create` methods that accept the `initialize` arguments. Then the class could handle it internally:
   
   ```java
   public static NessieCatalog create(String name, Map<String, String> config) {
     NessieCatalog newCatalog = new NessieCatalog();
     newCatalog.initialize(name, config);
     return newCatalog;
   }
   ```
   
   That seems like the right way to handle direct catalog creation, since we no longer support the direct constructors.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2345: Nessie: assert that `NessieCatalog` is initialized

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2345:
URL: https://github.com/apache/iceberg/pull/2345#discussion_r595972212



##########
File path: nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java
##########
@@ -78,31 +78,63 @@
   private UpdateableReference reference;
   private String name;
   private FileIO fileIO;
+  private boolean initialized;
 
   public NessieCatalog() {
   }
 
   @Override
   public void initialize(String inputName, Map<String, String> options) {
-    String fileIOImpl = options.get(CatalogProperties.FILE_IO_IMPL);
-    this.fileIO = fileIOImpl == null ? new HadoopFileIO(config) : CatalogUtil.loadFileIO(fileIOImpl, options, config);
-    this.name = inputName == null ? "nessie" : inputName;
-    // remove nessie prefix
-    final Function<String, String> removePrefix = x -> x.replace("nessie.", "");
+    if (initialized) {
+      close();
+    }
+    if (config == null) {
+      throw new IllegalStateException(String.format("setConf() must be called before initialize() for NessieCatalog '%s'", name));

Review comment:
       Iceberg logs are of the format `Cannot <do something>. <reason>. [possible fix]` 

##########
File path: nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java
##########
@@ -78,31 +78,63 @@
   private UpdateableReference reference;
   private String name;
   private FileIO fileIO;
+  private boolean initialized;
 
   public NessieCatalog() {
   }
 
   @Override
   public void initialize(String inputName, Map<String, String> options) {
-    String fileIOImpl = options.get(CatalogProperties.FILE_IO_IMPL);
-    this.fileIO = fileIOImpl == null ? new HadoopFileIO(config) : CatalogUtil.loadFileIO(fileIOImpl, options, config);
-    this.name = inputName == null ? "nessie" : inputName;
-    // remove nessie prefix
-    final Function<String, String> removePrefix = x -> x.replace("nessie.", "");
+    if (initialized) {

Review comment:
       I am not sure we should support 2x initialization

##########
File path: nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java
##########
@@ -78,31 +78,63 @@
   private UpdateableReference reference;
   private String name;
   private FileIO fileIO;
+  private boolean initialized;
 
   public NessieCatalog() {
   }
 
   @Override
   public void initialize(String inputName, Map<String, String> options) {
-    String fileIOImpl = options.get(CatalogProperties.FILE_IO_IMPL);
-    this.fileIO = fileIOImpl == null ? new HadoopFileIO(config) : CatalogUtil.loadFileIO(fileIOImpl, options, config);
-    this.name = inputName == null ? "nessie" : inputName;
-    // remove nessie prefix
-    final Function<String, String> removePrefix = x -> x.replace("nessie.", "");
+    if (initialized) {
+      close();
+    }
+    if (config == null) {
+      throw new IllegalStateException(String.format("setConf() must be called before initialize() for NessieCatalog '%s'", name));
+    }
+
+    try {
+      String fileIOImpl = options.get(CatalogProperties.FILE_IO_IMPL);
+      this.fileIO = fileIOImpl == null ? new HadoopFileIO(config) : CatalogUtil.loadFileIO(fileIOImpl, options, config);
+      this.name = inputName == null ? "nessie" : inputName;
+      // remove nessie prefix
+      final Function<String, String> removePrefix = x -> x.replace("nessie.", "");
 
-    this.client = NessieClient.builder().fromConfig(x -> options.get(removePrefix.apply(x))).build();
+      this.client = NessieClient.builder().fromConfig(x -> options.get(removePrefix.apply(x))).build();
 
-    this.warehouseLocation = options.get(CatalogProperties.WAREHOUSE_LOCATION);
-    if (warehouseLocation == null) {
-      throw new IllegalStateException("Parameter warehouse not set, nessie can't store data.");
+      this.warehouseLocation = options.get(CatalogProperties.WAREHOUSE_LOCATION);
+      if (warehouseLocation == null) {
+        throw new IllegalStateException("Parameter warehouse not set, nessie can't store data.");
+      }
+      final String requestedRef = options.get(removePrefix.apply(NessieConfigConstants.CONF_NESSIE_REF));
+      this.reference = loadReference(requestedRef);
+
+      this.initialized = true;
+    } catch (RuntimeException e) {
+      close();
+      throw new RuntimeException("Failed to initialize NessieCatalog with options " + options);
+    }
+  }
+
+  private void assertInitialized() {

Review comment:
       All catalogs have to perform the same steps: {no-arg constructor, setConf, initialize} in that order. I wonder if there is a way to extend this check beyond the nessie catalog. Some ideas:
   
   * enforce in base class
   * don't support initialization outside of `CatalogUtil`
   
   It seems like adding this logic into every `Catalog` and expecting users to know how they have to be constructed in order is fragile
   
   @rdblue @aokolnychyi @RussellSpitzer what do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2345: Nessie: assert that `NessieCatalog` is initialized

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2345:
URL: https://github.com/apache/iceberg/pull/2345#discussion_r597298621



##########
File path: nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java
##########
@@ -78,31 +78,64 @@
   private UpdateableReference reference;
   private String name;
   private FileIO fileIO;
+  private boolean initialized;
 
   public NessieCatalog() {
   }
 
   @Override
   public void initialize(String inputName, Map<String, String> options) {
-    String fileIOImpl = options.get(CatalogProperties.FILE_IO_IMPL);
-    this.fileIO = fileIOImpl == null ? new HadoopFileIO(config) : CatalogUtil.loadFileIO(fileIOImpl, options, config);
-    this.name = inputName == null ? "nessie" : inputName;
-    // remove nessie prefix
-    final Function<String, String> removePrefix = x -> x.replace("nessie.", "");
+    if (initialized) {
+      close();
+    }
+    if (config == null) {

Review comment:
       Missing a newline between control flow statements.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org