You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/02/03 15:08:33 UTC

[GitHub] [iceberg] rymurr opened a new pull request #2203: HiveCatalog and HiveClientPool fixes

rymurr opened a new pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203


   * remove option to create pool w/o a defined size
   * deprecate last no-arg constructor in HiveCatalog
   * move tests to use no-arg constructor
   
   **NOTE:** the `HiveCatalog` copies a new Hadoop Configuration which is a breaking change from how the constructor used to work. See `HiveMetastoreTest` changes for ramifications.
   
   As discussed in https://github.com/apache/iceberg/pull/2180#issuecomment-770062242


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#issuecomment-773347726


   > > NOTE: the HiveCatalog copies a new Hadoop Configuration which is a breaking change from how the constructor used to work. See HiveMetastoreTest changes for ramifications.
   > 
   > Can you be more clear? I don't see anything major. What do you mean?
   
   `setConf` of `HiveCatalog` does `new Configuration(conf)` whereas the constructor we are removing doesn't copy it. To ensure changes to hadoop configs is visible in `HiveCatalog` the user has to change or re-`setConf`. Not a huge issue and probably the correct behaviour as far as we are concerned. But it did require changing a test and may have repercussions for end users.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#issuecomment-773373588


   Thanks for the review @rdblue I have updated based on your comments. I have also gone ahead and removed the deprecated constructors but that kicked off a few other changes. Let me know if you prefer that as a separate PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#discussion_r592856169



##########
File path: flink/src/main/java/org/apache/iceberg/flink/CatalogLoader.java
##########
@@ -109,7 +109,7 @@ private HiveCatalogLoader(String catalogName, Configuration conf, Map<String, St
 
     @Override
     public Catalog loadCatalog() {
-      return new HiveCatalog(catalogName, uri, warehouse, clientPoolSize, hadoopConf.get(), properties);
+      return CatalogUtil.loadCatalog(HiveCatalog.class.getName(), catalogName, properties, hadoopConf.get());

Review comment:
       Question: Shall we use a constant in `CatalogUtil` for this? What way should we promote? Static import for the constant may make the line a bit shorter.

##########
File path: flink/src/test/java/org/apache/iceberg/flink/FlinkTestBase.java
##########
@@ -57,7 +59,8 @@ public static void startMetastore() {
     FlinkTestBase.metastore = new TestHiveMetastore();
     metastore.start();
     FlinkTestBase.hiveConf = metastore.hiveConf();
-    FlinkTestBase.catalog = new HiveCatalog(metastore.hiveConf());
+    FlinkTestBase.catalog = (HiveCatalog)

Review comment:
       I wish we parameterized `loadCatalog`.

##########
File path: spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java
##########
@@ -62,7 +64,8 @@ public static void startMetastoreAndSpark() {
         .enableHiveSupport()
         .getOrCreate();
 
-    SparkTestBase.catalog = new HiveCatalog(spark.sessionState().newHadoopConf());
+    SparkTestBase.catalog = (HiveCatalog)
+        CatalogUtil.loadCatalog(HiveCatalog.class.getName(), "hive", ImmutableMap.of(), hiveConf);

Review comment:
       nit: `hiveConf` is not technically the same as `sessionState().newHadoopConf()` but it does not matter in tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#issuecomment-772942654


   > NOTE: the HiveCatalog copies a new Hadoop Configuration which is a breaking change from how the constructor used to work. See HiveMetastoreTest changes for ramifications.
   
   Can you be more clear? I don't see anything major. What do you mean?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#discussion_r570262570



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -74,9 +74,18 @@
   public HiveCatalog() {
   }
 
+  /**
+   * Hive Catalog constructor.
+   *
+   * @deprecated please use the no-arg constructor, setConf and initialize to construct the catalog. Will be removed in
+   * v0.12.0

Review comment:
       done, also removed the `HadoopCatalog` constructors

##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalogs.java
##########
@@ -34,6 +36,7 @@ private HiveCatalogs() {
   public static HiveCatalog loadCatalog(Configuration conf) {
     // metastore URI can be null in local mode
     String metastoreUri = conf.get(HiveConf.ConfVars.METASTOREURIS.varname, "");
-    return CATALOG_CACHE.get(metastoreUri, uri -> new HiveCatalog(conf));
+    return CATALOG_CACHE.get(metastoreUri, uri -> (HiveCatalog) CatalogUtil.loadCatalog(HiveCatalog.class.getName(),
+        "hive", ImmutableMap.of(), conf));

Review comment:
       agreed, fixed. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#discussion_r570265490



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalogs.java
##########
@@ -34,6 +36,7 @@ private HiveCatalogs() {
   public static HiveCatalog loadCatalog(Configuration conf) {
     // metastore URI can be null in local mode
     String metastoreUri = conf.get(HiveConf.ConfVars.METASTOREURIS.varname, "");
-    return CATALOG_CACHE.get(metastoreUri, uri -> new HiveCatalog(conf));
+    return CATALOG_CACHE.get(metastoreUri, uri -> (HiveCatalog) CatalogUtil.loadCatalog(HiveCatalog.class.getName(),
+        "hive", ImmutableMap.of(), conf));

Review comment:
       agreed, fixed. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#issuecomment-773347726






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#discussion_r592930310



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -74,78 +73,22 @@
   public HiveCatalog() {
   }
 
-  public HiveCatalog(Configuration conf) {
-    this.name = "hive";
-    this.clients = new HiveClientPool(conf);
-    this.conf = conf;
-    this.createStack = Thread.currentThread().getStackTrace();
-    this.closed = false;
-    this.fileIO = new HadoopFileIO(conf);
-  }
-
-  /**
-   * Hive Catalog constructor.
-   *
-   * @deprecated please use the no-arg constructor, setConf and initialize to construct the catalog. Will be removed in
-   * v0.12.0
-   * @param name catalog name
-   * @param uri Hive metastore uri
-   * @param clientPoolSize size of client pool
-   * @param conf Hadoop Configuration
-   */
-  @Deprecated
-  public HiveCatalog(String name, String uri, int clientPoolSize, Configuration conf) {
-    this(name, uri, null, clientPoolSize, conf);
-  }
-
   /**
    * Hive Catalog constructor.
    *
    * @deprecated please use the no-arg constructor, setConf and initialize to construct the catalog. Will be removed in
-   * v0.12.0
-   * @param name catalog name
-   * @param uri Hive metastore uri
-   * @param warehouse location of Hive warehouse
-   * @param clientPoolSize size of client pool
+   * v0.13.0
    * @param conf Hadoop Configuration
    */
   @Deprecated
-  public HiveCatalog(String name, String uri, String warehouse, int clientPoolSize, Configuration conf) {
-    this(name, uri, warehouse, clientPoolSize, conf, Maps.newHashMap());
-  }
-
-  /**
-   * Hive Catalog constructor.
-   *
-   * @deprecated please use the no-arg constructor, setConf and initialize to construct the catalog. Will be removed in
-   * v0.12.0
-   * @param name catalog name
-   * @param uri Hive metastore uri
-   * @param warehouse location of Hive warehouse
-   * @param clientPoolSize size of client pool
-   * @param conf Hadoop Configuration
-   * @param properties extra Hive configuration properties
-   */
-  @Deprecated
-  public HiveCatalog(
-      String name,
-      String uri,
-      String warehouse,
-      int clientPoolSize,
-      Configuration conf,
-      Map<String, String> properties) {
-
-    Map<String, String> props = new HashMap<>(properties);
-    if (warehouse != null) {
-      props.put(CatalogProperties.WAREHOUSE_LOCATION, warehouse);
-    }
-    if (uri != null) {
-      props.put(CatalogProperties.URI, uri);
-    }
-    props.put(CatalogProperties.CLIENT_POOL_SIZE, Integer.toString(clientPoolSize));
-
-    setConf(conf);
-    initialize(name, props);
+  public HiveCatalog(Configuration conf) {
+    this.name = "hive";
+    int clientPoolSize = conf.getInt(CatalogProperties.CLIENT_POOL_SIZE, CatalogProperties.CLIENT_POOL_SIZE_DEFAULT);

Review comment:
       @rymurr, shall we use `setConf` and `initialize` like you in `HadoopCatalog` PR here too?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#discussion_r569863174



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -74,9 +74,18 @@
   public HiveCatalog() {
   }
 
+  /**
+   * Hive Catalog constructor.
+   *
+   * @deprecated please use the no-arg constructor, setConf and initialize to construct the catalog. Will be removed in
+   * v0.12.0

Review comment:
       0.12.0 is probably the next release. Can you change this to 0.13.0?
   
   Also, now that 0.11.0 is out, can we remove the methods to remove in 0.12.0?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#issuecomment-796258733


   Let me take a look at this now. Sorry I did not get to this one earlier.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#issuecomment-796261775


   The change looks correct to me. @rymurr, shall we split this PR into two and address Hadoop and Hive catalogs separately?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#discussion_r570262570



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -74,9 +74,18 @@
   public HiveCatalog() {
   }
 
+  /**
+   * Hive Catalog constructor.
+   *
+   * @deprecated please use the no-arg constructor, setConf and initialize to construct the catalog. Will be removed in
+   * v0.12.0

Review comment:
       done, also removed the `HadoopCatalog` constructors




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#issuecomment-796723589


   Thanks for taking a look @aokolnychyi I have removed the hadoop changes and rebased


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2203: HiveCatalog and HiveClientPool fixes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2203:
URL: https://github.com/apache/iceberg/pull/2203#discussion_r569864524



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalogs.java
##########
@@ -34,6 +36,7 @@ private HiveCatalogs() {
   public static HiveCatalog loadCatalog(Configuration conf) {
     // metastore URI can be null in local mode
     String metastoreUri = conf.get(HiveConf.ConfVars.METASTOREURIS.varname, "");
-    return CATALOG_CACHE.get(metastoreUri, uri -> new HiveCatalog(conf));
+    return CATALOG_CACHE.get(metastoreUri, uri -> (HiveCatalog) CatalogUtil.loadCatalog(HiveCatalog.class.getName(),
+        "hive", ImmutableMap.of(), conf));

Review comment:
       Minor: I find this line wrapping very awkward because at a glance and from indentation, it looks like these args are passed to `CATALOG_CACHE.get`. But they are actually inside the lambda and passed to `loadCatalog`.
   
   When breaking lines, I prefer to try not to just break wherever a line gets too long and instead break at reasonable points for readability. If possible, I think this should break at `CatalogUtil.loadCatalog` so that the entire method call is on one line. Otherwise, putting all args on one line makes the most sense so that at least the previous line ends with `loadCatalog(` and it is clear that the next line starts the argument list.
   
   Same with the other changes above. Those aren't as bad as this, but I'd still wrap a little differently (not after the first method arg) to increase readability.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org