You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "szehon-ho (via GitHub)" <gi...@apache.org> on 2023/05/01 18:46:22 UTC

[GitHub] [iceberg] szehon-ho opened a new pull request, #7441: Hive: Support connecting to multiple Hive-Catalog

szehon-ho opened a new pull request, #7441:
URL: https://github.com/apache/iceberg/pull/7441

   _Note:   As the term Hive-Catalog is overloaded across, I'll use "Iceberg-HiveCatalog" to refer to Iceberg's HiveCatalog class and "HMS Catalog" to refer to Hive specific concept._
   
   https://issues.apache.org/jira/browse/HIVE-18685 added support for HMS catalogs, ie different namespaces for dbs/tables in the metastore.
   
   However, the Iceberg-HiveCatalog uses a global cache of HMS connections based on Hive-Metastore-URI.  This prevents different Iceberg-HiveCatalogs existing in same JVM from talking to different HMS Catalogs on same HMS.  See issue:  https://github.com/apache/iceberg/pull/5378.  This is important for example in Spark, where user may try to configure different SparkCatalog pointing to separate catalog on same HMS.   
   
   This change takes advantage of https://github.com/apache/iceberg/pull/6698 to fix this.  It does two things:
   
   - Add a 'hive-catalog'  flag on Iceberg-HiveCatalog (similar to "uri" and "warehouse"), which automatically sets this on HMS client to point to right HMS Catalog.  
   - Adds the HMS client config "metastore.catalog.default" to the list of HMS Client cache keys.
   
   From Spark point of view, user can configure 'spark.sql.catalog.$cat1.hive-catalog=foo', 'spark.sql.catalog.$cat2.hive-catalog=bar'.
   
   Note; @RussellSpitzer pointed to me that setting  spark.sql.catalog.$myCatalog.hadoop.metastore.catalog.default=foo seems to work to configure each SparkCatalog.  But it is a bit clunky and undocumented, and hence the proposal to add 'hive_catalog' shorthand.
   
   In either case, the second change is needed to allow different cached connections on same HMS to go to different HMS-catalogs. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7441:
URL: https://github.com/apache/iceberg/pull/7441#issuecomment-1526977029

   Thanks.  Yes, it depends on engine, I believe Spark has this support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7441:
URL: https://github.com/apache/iceberg/pull/7441#issuecomment-1530448854

   Last test cleanup is not related to failure (is just good to clear for next test).  Thanks @pvary @hililiwei  for review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7441:
URL: https://github.com/apache/iceberg/pull/7441#discussion_r1179924760


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -64,10 +64,15 @@ public class HiveCatalog extends BaseMetastoreCatalog implements SupportsNamespa
   public static final String LIST_ALL_TABLES = "list-all-tables";
   public static final String LIST_ALL_TABLES_DEFAULT = "false";
 
+  public static final String HMS_CATALOG = "hive-catalog";

Review Comment:
   Actually decided not to add this extra config for now, as user can use spark.sql.catalog.$myCatalog.hadoop.metastore.catalog.default.  Can revisit this later, but for now no need to document.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7441:
URL: https://github.com/apache/iceberg/pull/7441#discussion_r1179924760


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -64,10 +64,15 @@ public class HiveCatalog extends BaseMetastoreCatalog implements SupportsNamespa
   public static final String LIST_ALL_TABLES = "list-all-tables";
   public static final String LIST_ALL_TABLES_DEFAULT = "false";
 
+  public static final String HMS_CATALOG = "hive-catalog";

Review Comment:
   Actually decided not to add this extra config for now, as user can use spark.sql.catalog.$myCatalog.hadoop.metastore.catalog.default.  Can revisit this later, but for now there's no new flag introduced.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7441:
URL: https://github.com/apache/iceberg/pull/7441#discussion_r1179435649


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -64,10 +64,15 @@ public class HiveCatalog extends BaseMetastoreCatalog implements SupportsNamespa
   public static final String LIST_ALL_TABLES = "list-all-tables";
   public static final String LIST_ALL_TABLES_DEFAULT = "false";
 
+  public static final String HMS_CATALOG = "hive-catalog";

Review Comment:
   Good point, will do in follow-up pr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7441:
URL: https://github.com/apache/iceberg/pull/7441#discussion_r1179924760


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -64,10 +64,15 @@ public class HiveCatalog extends BaseMetastoreCatalog implements SupportsNamespa
   public static final String LIST_ALL_TABLES = "list-all-tables";
   public static final String LIST_ALL_TABLES_DEFAULT = "false";
 
+  public static final String HMS_CATALOG = "hive-catalog";

Review Comment:
   Actually decided not to add this extra config for now, as user can use spark.sql.catalog.$myCatalog.hadoop.metastore.catalog.default.  Can revisit this later, but for now no need to document, as its just the Hive flag.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on code in PR #7441:
URL: https://github.com/apache/iceberg/pull/7441#discussion_r1178614383


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -64,10 +64,15 @@ public class HiveCatalog extends BaseMetastoreCatalog implements SupportsNamespa
   public static final String LIST_ALL_TABLES = "list-all-tables";
   public static final String LIST_ALL_TABLES_DEFAULT = "false";
 
+  public static final String HMS_CATALOG = "hive-catalog";

Review Comment:
   Do we have to document this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7441:
URL: https://github.com/apache/iceberg/pull/7441#discussion_r1179435649


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -64,10 +64,15 @@ public class HiveCatalog extends BaseMetastoreCatalog implements SupportsNamespa
   public static final String LIST_ALL_TABLES = "list-all-tables";
   public static final String LIST_ALL_TABLES_DEFAULT = "false";
 
+  public static final String HMS_CATALOG = "hive-catalog";

Review Comment:
   Will do in follow-up pr, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho merged pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho merged PR #7441:
URL: https://github.com/apache/iceberg/pull/7441


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7441:
URL: https://github.com/apache/iceberg/pull/7441#issuecomment-1530063887

   Test failed looks unrelated, kicking off again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho closed pull request #7441: Hive: Support connecting to multiple Hive-Catalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho closed pull request #7441: Hive: Support connecting to multiple Hive-Catalog
URL: https://github.com/apache/iceberg/pull/7441


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org