You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/07 23:21:17 UTC

[GitHub] [iceberg] jackye1995 opened a new pull request #2565: Hive: unify catalog experience across engines

jackye1995 opened a new pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565


   As discussed in #2544, make Hive catalog experience consistent with Spark and Flink, specifically:
   1. when `type` is `null`, look at `catalog-impl`, if that is also `null` then use `HiveCatalog` as default.
   2. there is also no need to have templates `CATALOG_WAREHOUSE_TEMPLATE` and `CATALOG_CLASS_TEMPLATE`, because they are unified in `CatalogProperties`.
   
   I also rewrote all the tests to make each test case clear in separated tests instead of a single giant test, please take a look if I missed any edge case in testing.
   
   @pvary @marton-bod @lcspinter @aokolnychyi @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#issuecomment-841972814


   @pvary @marton-bod updated and fixed all tests, should be good for another review, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r632390890



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);
+  }
+
+  /**
+   * Get Hadoop config key of catalog warehouse location based on catalog name
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog warehouse location for the catalog name
+   */
+  public static String catalogWarehouseConfigKey(String catalogName) {

Review comment:
       The `warehouse` should be an internal thing for the HadoopCatalog. I think no other Catalog should use that. Do I miss something?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629313822



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {

Review comment:
       The naming schema of the tests seems a bit odd for me.
   Do we use something like it somewhere in the Iceberg code already?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629265247



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);

Review comment:
       shall we add `type` to the `CatalogProperties` as well so we can reuse `catalogPropertyConfigKey` here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633155294



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -267,8 +265,8 @@ public static boolean hiveCatalog(Configuration conf, Properties props) {
    */
   private static String getCatalogType(Configuration conf, String catalogName) {
     if (catalogName != null) {
-      String catalogType = conf.get(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName));
-      if (catalogName.equals(ICEBERG_HADOOP_TABLE_NAME) || catalogType == null) {
+      String catalogType = conf.get(InputFormatConfig.catalogTypeConfigKey(catalogName));
+      if (catalogName.equals(ICEBERG_HADOOP_TABLE_NAME)) {

Review comment:
       I moved the doc of this method to `Catalogs` class level, because private method docs are not generated in html.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#issuecomment-841885028


   We have merged the hive doc PR, so I have added updates based on that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629538091



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);
+  }
+
+  /**
+   * Get Hadoop config key of catalog warehouse location based on catalog name
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog warehouse location for the catalog name
+   */
+  public static String catalogWarehouseConfigKey(String catalogName) {

Review comment:
       No. I added this method because the `warehouse` property is commonly used. Technically we can just use `catalogPropertyConfigKey(...)` method for everything.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629266171



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }

Review comment:
       Don't we need a test case for `legacy_custom` too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633144895



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }
 
-    // arbitrary catalog name with non existent catalog type
+  @Test
+  public void testLoadCatalog_unknown() {
     String catalogName = "barCatalog";
-    conf.unset(InputFormatConfig.CATALOG);
-    conf.set(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName), nonExistentCatalogType);
+    conf.set(InputFormatConfig.catalogTypeConfigKey(catalogName), "fooType");
     AssertHelpers.assertThrows(
             "should complain about catalog not supported", UnsupportedOperationException.class,
             "Unknown catalog type:", () -> Catalogs.loadCatalog(conf, catalogName));
+  }
 
-    // arbitrary catalog name with hadoop catalog type and default warehouse location
-    conf.set(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName),
-            CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
-    hadoopCatalog = Catalogs.loadCatalog(conf, catalogName);
-
+  @Test
+  public void testLoadCatalog_hadoop_warehouseLocation() {
+    String catalogName = "barCatalog";
+    conf.set(InputFormatConfig.catalogTypeConfigKey(catalogName), CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
+    conf.set(InputFormatConfig.catalogWarehouseConfigKey(catalogName), "/tmp/mylocation");
+    Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, catalogName);
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+    Assert.assertEquals("HadoopCatalog{name=barCatalog, location=/tmp/mylocation}", hadoopCatalog.get().toString());

Review comment:
       This was in the original test, so sorry I did not look too much into it. I agree that it's better to test with those methods, but unfortunately `HadoopCatalog.warehouseLocation()` does not exist, so we cannot really use that, so I suppose that is why this test is written in this way to get the warehouse location.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629535504



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {

Review comment:
       not really, let me change it to existing convention




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633146138



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }

Review comment:
       Actually, because the `CATALOG_LOADER_CLASS` (`iceberg.mr.catalog.loader.class`) is no longer used, we cannot load a custom catalog in the legacy mode, so there is no test for it. 
   
   Looking at https://github.com/apache/iceberg/pull/2129/files#diff-c183ea3aa154c2a5012f87d7a06dba3cff3f27975384e9fb4040fe6850a98bd6L192-L193, this seems like a backwards incompatible change. @lcspinter is this intentional, or should we add that part of the logic back?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633151118



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);

Review comment:
       In addition, I am treating `CatalogUtil.ICEBERG_CATALOG_TYPE` as the catalog property to use, and removed reference to `InputFormatConfig.CATALOG_TYPE_TEMPLATE`. I also added documentation inside `CatalogUtil` to make things clear.
   
   I don't personally like the fact that `CatalogUtil.ICEBERG_CATALOG_TYPE` is in `CatalogUtil` instead of `CatalogProperties`, but we already released it in 0.11 as public variables so I decide to just keep it this way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633145438



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -267,8 +265,8 @@ public static boolean hiveCatalog(Configuration conf, Properties props) {
    */
   private static String getCatalogType(Configuration conf, String catalogName) {
     if (catalogName != null) {
-      String catalogType = conf.get(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName));
-      if (catalogName.equals(ICEBERG_HADOOP_TABLE_NAME) || catalogType == null) {
+      String catalogType = conf.get(InputFormatConfig.catalogTypeConfigKey(catalogName));
+      if (catalogName.equals(ICEBERG_HADOOP_TABLE_NAME)) {

Review comment:
       updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629355608



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);
+  }
+
+  /**
+   * Get Hadoop config key of catalog warehouse location based on catalog name
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog warehouse location for the catalog name
+   */
+  public static String catalogWarehouseConfigKey(String catalogName) {

Review comment:
       Do we use this in other part of the code than the ones that we will deprecate when we deprecate the old configs?
   If so we might want to add `@deprecated` so nobody will start to use it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629500051



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);

Review comment:
       I see. Which one does Spark use? Would it make sense to standardize w/ the other engines now and use `catalog-type` here too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633150812



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }

Review comment:
       for now, I have updated the code to allow this legacy behavior, and added the missing test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629264669



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -267,8 +265,8 @@ public static boolean hiveCatalog(Configuration conf, Properties props) {
    */
   private static String getCatalogType(Configuration conf, String catalogName) {
     if (catalogName != null) {
-      String catalogType = conf.get(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName));
-      if (catalogName.equals(ICEBERG_HADOOP_TABLE_NAME) || catalogType == null) {
+      String catalogType = conf.get(InputFormatConfig.catalogTypeConfigKey(catalogName));
+      if (catalogName.equals(ICEBERG_HADOOP_TABLE_NAME)) {

Review comment:
       Do we need to update the javadocs of this method according to this change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#issuecomment-843088341


   Thanks @jackye1995 for the patch and @marton-bod , @lcspinter for the reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#issuecomment-836648885


   Thanks @jackye1995 for this, a few comments. 
   Also, shouldn't we include the hive docs change (re. catalog loading) in this PR too to keep them in lockstep?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633145514



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);

Review comment:
       So while Ryan and Anton have not replied, I will keep it like this, we can always add `catalog-type` as a new catalog properties in a new PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary merged pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
pvary merged pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#issuecomment-837013251


   > shouldn't we include the hive docs change (re. catalog loading) in this PR too to keep them in lockstep?
   
   @marton-bod depending on which one gets merged first, I will make change in the other PR accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633435261



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);
+  }
+
+  /**
+   * Get Hadoop config key of catalog warehouse location based on catalog name
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog warehouse location for the catalog name
+   */
+  public static String catalogWarehouseConfigKey(String catalogName) {

Review comment:
       I might missing something, but we use this `catalogWarehouseConfigKey` only in tests. The reason here is that in the tests we would like to set / get the warehouse for the HadoopCatalog and the CustomCatalog (which is again only a test class). In production code we do not need this key, so I think we should move it from the live code to a test class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#issuecomment-841885028






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633435261



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);
+  }
+
+  /**
+   * Get Hadoop config key of catalog warehouse location based on catalog name
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog warehouse location for the catalog name
+   */
+  public static String catalogWarehouseConfigKey(String catalogName) {

Review comment:
       I might missing something, but we use this `catalogWarehouseConfigKey` only in tests. The reason here is that in the tests we would like to set / get the warehouse for the HadoopCatalog and the CustomCatalog (which is again only a test class). In production code we do not need this key, so I think we should move it from the live code to a test class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629534647



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);

Review comment:
       Yeah that's actually one way we can consider, @rdblue @aokolnychyi , would it be acceptable if we unify all engines to ause `catalog-type` instead, and `type` is just an alias for `catalog-type` in Spark and Hive?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633146400



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);
+  }
+
+  /**
+   * Get Hadoop config key of catalog warehouse location based on catalog name
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog warehouse location for the catalog name
+   */
+  public static String catalogWarehouseConfigKey(String catalogName) {

Review comment:
       @pvary it is used in all catalogs except `HiveCatalog`, because Hive uses `HiveConf.ConfVars.METASTOREWAREHOUSE` instead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629534995



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }

Review comment:
       Yeah there was no test for that, I will add it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629483663



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);

Review comment:
       It was not added because `type` is not consistent across engines. Flink has to use `catalog-type` instead of `type` because `type` is a reserved config key.
   
   With that being said, we can probably add it and specify that Flink has to use `catalog-type`. Let me do that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#issuecomment-843019974


   @lcspinter: could you please take a look, as you are to one most familiar with this code now?
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629535089



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }
 
-    // arbitrary catalog name with non existent catalog type
+  @Test
+  public void testLoadCatalog_unknown() {
     String catalogName = "barCatalog";
-    conf.unset(InputFormatConfig.CATALOG);
-    conf.set(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName), nonExistentCatalogType);
+    conf.set(InputFormatConfig.catalogTypeConfigKey(catalogName), "fooType");
     AssertHelpers.assertThrows(
             "should complain about catalog not supported", UnsupportedOperationException.class,
             "Unknown catalog type:", () -> Catalogs.loadCatalog(conf, catalogName));
+  }
 
-    // arbitrary catalog name with hadoop catalog type and default warehouse location
-    conf.set(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName),
-            CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
-    hadoopCatalog = Catalogs.loadCatalog(conf, catalogName);
-
+  @Test
+  public void testLoadCatalog_hadoop_warehouseLocation() {

Review comment:
       will do




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629239615



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }
 
-    // arbitrary catalog name with non existent catalog type
+  @Test
+  public void testLoadCatalog_unknown() {
     String catalogName = "barCatalog";
-    conf.unset(InputFormatConfig.CATALOG);
-    conf.set(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName), nonExistentCatalogType);
+    conf.set(InputFormatConfig.catalogTypeConfigKey(catalogName), "fooType");
     AssertHelpers.assertThrows(
             "should complain about catalog not supported", UnsupportedOperationException.class,
             "Unknown catalog type:", () -> Catalogs.loadCatalog(conf, catalogName));
+  }
 
-    // arbitrary catalog name with hadoop catalog type and default warehouse location
-    conf.set(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName),
-            CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
-    hadoopCatalog = Catalogs.loadCatalog(conf, catalogName);
-
+  @Test
+  public void testLoadCatalog_hadoop_warehouseLocation() {
+    String catalogName = "barCatalog";
+    conf.set(InputFormatConfig.catalogTypeConfigKey(catalogName), CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
+    conf.set(InputFormatConfig.catalogWarehouseConfigKey(catalogName), "/tmp/mylocation");
+    Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, catalogName);
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+    Assert.assertEquals("HadoopCatalog{name=barCatalog, location=/tmp/mylocation}", hadoopCatalog.get().toString());

Review comment:
       I think hardcoding the toString in the assert can become brittle and difficult to maintain over time. Shouldn't we add a `warehouseLocation()` method to `HadoopCatalog`, and then assert for `catalog.name()` and `((HadoopCatalog) catalog).warehouseLocation()` for the same effect?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633618454



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);
+  }
+
+  /**
+   * Get Hadoop config key of catalog warehouse location based on catalog name
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog warehouse location for the catalog name
+   */
+  public static String catalogWarehouseConfigKey(String catalogName) {

Review comment:
       Sure, I can remove that method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r629278560



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }
 
-    // arbitrary catalog name with non existent catalog type
+  @Test
+  public void testLoadCatalog_unknown() {
     String catalogName = "barCatalog";
-    conf.unset(InputFormatConfig.CATALOG);
-    conf.set(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName), nonExistentCatalogType);
+    conf.set(InputFormatConfig.catalogTypeConfigKey(catalogName), "fooType");
     AssertHelpers.assertThrows(
             "should complain about catalog not supported", UnsupportedOperationException.class,
             "Unknown catalog type:", () -> Catalogs.loadCatalog(conf, catalogName));
+  }
 
-    // arbitrary catalog name with hadoop catalog type and default warehouse location
-    conf.set(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName),
-            CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
-    hadoopCatalog = Catalogs.loadCatalog(conf, catalogName);
-
+  @Test
+  public void testLoadCatalog_hadoop_warehouseLocation() {

Review comment:
       nit: can we follow the same order in the tests both for legacy and for non-legacy cases just for consistency? e.g. default, hive, hadoop, location, custom, unknown




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2565: Hive: unify catalog experience across engines

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2565:
URL: https://github.com/apache/iceberg/pull/2565#discussion_r633144895



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }
 
-    // arbitrary catalog name with non existent catalog type
+  @Test
+  public void testLoadCatalog_unknown() {
     String catalogName = "barCatalog";
-    conf.unset(InputFormatConfig.CATALOG);
-    conf.set(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName), nonExistentCatalogType);
+    conf.set(InputFormatConfig.catalogTypeConfigKey(catalogName), "fooType");
     AssertHelpers.assertThrows(
             "should complain about catalog not supported", UnsupportedOperationException.class,
             "Unknown catalog type:", () -> Catalogs.loadCatalog(conf, catalogName));
+  }
 
-    // arbitrary catalog name with hadoop catalog type and default warehouse location
-    conf.set(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName),
-            CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
-    hadoopCatalog = Catalogs.loadCatalog(conf, catalogName);
-
+  @Test
+  public void testLoadCatalog_hadoop_warehouseLocation() {
+    String catalogName = "barCatalog";
+    conf.set(InputFormatConfig.catalogTypeConfigKey(catalogName), CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
+    conf.set(InputFormatConfig.catalogWarehouseConfigKey(catalogName), "/tmp/mylocation");
+    Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, catalogName);
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+    Assert.assertEquals("HadoopCatalog{name=barCatalog, location=/tmp/mylocation}", hadoopCatalog.get().toString());

Review comment:
       This was in the original test, so sorry I did not look too much into it. I agree that it's better to test with those methods, but unfortunately `HadoopCatalog.warehouseLocation()` does not exist, so we cannot really use that, so I suppose that is why this test is written in this way to get the warehouse location.

##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -267,8 +265,8 @@ public static boolean hiveCatalog(Configuration conf, Properties props) {
    */
   private static String getCatalogType(Configuration conf, String catalogName) {
     if (catalogName != null) {
-      String catalogType = conf.get(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName));
-      if (catalogName.equals(ICEBERG_HADOOP_TABLE_NAME) || catalogType == null) {
+      String catalogType = conf.get(InputFormatConfig.catalogTypeConfigKey(catalogName));
+      if (catalogName.equals(ICEBERG_HADOOP_TABLE_NAME)) {

Review comment:
       updated

##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);

Review comment:
       So while Ryan and Anton have not replied, I will keep it like this, we can always add `catalog-type` as a new catalog properties in a new PR.

##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }

Review comment:
       Actually, because the `CATALOG_LOADER_CLASS` (`iceberg.mr.catalog.loader.class`) is no longer used, we cannot load a custom catalog in the legacy mode, so there is no test for it. 
   
   Looking at https://github.com/apache/iceberg/pull/2129/files#diff-c183ea3aa154c2a5012f87d7a06dba3cff3f27975384e9fb4040fe6850a98bd6L192-L193, this seems like a backwards incompatible change. @lcspinter is this intentional, or should we add that part of the logic back?

##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);
+  }
+
+  /**
+   * Get Hadoop config key of catalog warehouse location based on catalog name
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog warehouse location for the catalog name
+   */
+  public static String catalogWarehouseConfigKey(String catalogName) {

Review comment:
       @pvary it is used in all catalogs except `HiveCatalog`, because Hive uses `HiveConf.ConfVars.METASTOREWAREHOUSE` instead.

##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java
##########
@@ -191,91 +191,103 @@ public void testCreateDropTableToCatalog() throws IOException {
   }
 
   @Test
-  public void testLoadCatalog() throws IOException {
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  public void testLoadCatalog_legacy_default() {
+    Optional<Catalog> defaultCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(defaultCatalog.isPresent());
+    Assert.assertTrue(defaultCatalog.get() instanceof HiveCatalog);
+  }
 
-    String nonExistentCatalogType = "fooType";
+  @Test
+  public void testLoadCatalog_legacy_hive() {
+    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
+    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
+    Assert.assertTrue(hiveCatalog.isPresent());
+    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, nonExistentCatalogType);
-    AssertHelpers.assertThrows(
-            "should complain about catalog not supported", UnsupportedOperationException.class,
-            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  @Test
+  public void testLoadCatalog_legacy_location() {
+    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
+    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  }
 
+  @Test
+  public void testLoadCatalog_legacy_hadoop() {
     conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HADOOP);
     conf.set(InputFormatConfig.HADOOP_CATALOG_WAREHOUSE_LOCATION, "/tmp/mylocation");
     Optional<Catalog> hadoopCatalog = Catalogs.loadCatalog(conf, null);
-
     Assert.assertTrue(hadoopCatalog.isPresent());
     Assert.assertTrue(hadoopCatalog.get() instanceof HadoopCatalog);
+  }
 
-    conf.set(InputFormatConfig.CATALOG, CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
-    Optional<Catalog> hiveCatalog = Catalogs.loadCatalog(conf, null);
-
-    Assert.assertTrue(hiveCatalog.isPresent());
-    Assert.assertTrue(hiveCatalog.get() instanceof HiveCatalog);
-
-    conf.set(InputFormatConfig.CATALOG, Catalogs.LOCATION);
-    Assert.assertFalse(Catalogs.loadCatalog(conf, null).isPresent());
+  @Test
+  public void testLoadCatalog_legacy_unknown() {
+    conf.set(InputFormatConfig.CATALOG, "fooType");
+    AssertHelpers.assertThrows(
+            "should complain about catalog not supported", UnsupportedOperationException.class,
+            "Unknown catalog type", () -> Catalogs.loadCatalog(conf, null));
+  }

Review comment:
       for now, I have updated the code to allow this legacy behavior, and added the missing test.

##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);

Review comment:
       In addition, I am treating `CatalogUtil.ICEBERG_CATALOG_TYPE` as the catalog property to use, and removed reference to `InputFormatConfig.CATALOG_TYPE_TEMPLATE`. I also added documentation inside `CatalogUtil` to make things clear.
   
   I don't personally like the fact that `CatalogUtil.ICEBERG_CATALOG_TYPE` is in `CatalogUtil` instead of `CatalogProperties`, but we already released it in 0.11 as public variables so I decide to just keep it this way.

##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -267,8 +265,8 @@ public static boolean hiveCatalog(Configuration conf, Properties props) {
    */
   private static String getCatalogType(Configuration conf, String catalogName) {
     if (catalogName != null) {
-      String catalogType = conf.get(String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName));
-      if (catalogName.equals(ICEBERG_HADOOP_TABLE_NAME) || catalogType == null) {
+      String catalogType = conf.get(InputFormatConfig.catalogTypeConfigKey(catalogName));
+      if (catalogName.equals(ICEBERG_HADOOP_TABLE_NAME)) {

Review comment:
       I moved the doc of this method to `Catalogs` class level, because private method docs are not generated in html.

##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -203,6 +202,44 @@ public static Schema readSchema(Configuration conf) {
     return readColumns != null && readColumns.length > 0 ? readColumns : null;
   }
 
+  /**
+   * Get Hadoop config key of catalog type based on catalog name and {@link #CATALOG_TYPE_TEMPLATE}
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog type for the catalog name
+   */
+  public static String catalogTypeConfigKey(String catalogName) {
+    return String.format(InputFormatConfig.CATALOG_TYPE_TEMPLATE, catalogName);
+  }
+
+  /**
+   * Get Hadoop config key of catalog warehouse location based on catalog name
+   * @param catalogName catalog name
+   * @return Hadoop config key of catalog warehouse location for the catalog name
+   */
+  public static String catalogWarehouseConfigKey(String catalogName) {

Review comment:
       Sure, I can remove that method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org