You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/25 09:46:14 UTC

[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #6045: [iceberg-hive-metastore] Add support for group ownership

gaborkaszab commented on code in PR #6045:
URL: https://github.com/apache/iceberg/pull/6045#discussion_r1004161902


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -532,13 +536,27 @@ Database convertToDatabase(Namespace namespace, Map<String, String> meta) {
 
     database.setName(namespace.level(0));
     database.setLocationUri(databaseLocation(namespace.level(0)));
+    // default ownership is determined from Hadoop.UGI.currentUser
+    // and the owner type is default to individual user
+    try {
+      // getShortUserName will remove the realm part of the whole username
+      database.setOwnerName(UserGroupInformation.getCurrentUser().getShortUserName());

Review Comment:
   I don't know how expensive it is to get the current user from UserGroupInformation (taking into account it's Hadoop functionality) but I think we should do this only if there is no HMS_DB_OWNER set in 'meta'.



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -532,13 +536,27 @@ Database convertToDatabase(Namespace namespace, Map<String, String> meta) {
 
     database.setName(namespace.level(0));
     database.setLocationUri(databaseLocation(namespace.level(0)));
+    // default ownership is determined from Hadoop.UGI.currentUser
+    // and the owner type is default to individual user
+    try {
+      // getShortUserName will remove the realm part of the whole username
+      database.setOwnerName(UserGroupInformation.getCurrentUser().getShortUserName());
+      database.setOwnerType(PrincipalType.USER);
+    } catch (IOException e) {
+      LOG.error("Unable to determine the current UGI user", e);
+      throw new RuntimeException(e);

Review Comment:
   If there is an exception, can't we fallback to the original behaviour of not setting these fields?



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -422,11 +425,22 @@ private Table newHmsTable(TableMetadata metadata) {
     Preconditions.checkNotNull(metadata, "'metadata' parameter can't be null");
     final long currentTimeMillis = System.currentTimeMillis();
 
+    String defaultUser;
+    // default ownership is determined from Hadoop.UGI.currentUser

Review Comment:
   nit: line could be up to 100 chars long



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -532,13 +536,27 @@ Database convertToDatabase(Namespace namespace, Map<String, String> meta) {
 
     database.setName(namespace.level(0));
     database.setLocationUri(databaseLocation(namespace.level(0)));
+    // default ownership is determined from Hadoop.UGI.currentUser

Review Comment:
   nit: I think the standard line length is 100 chars



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -422,11 +425,22 @@ private Table newHmsTable(TableMetadata metadata) {
     Preconditions.checkNotNull(metadata, "'metadata' parameter can't be null");
     final long currentTimeMillis = System.currentTimeMillis();
 
+    String defaultUser;
+    // default ownership is determined from Hadoop.UGI.currentUser
+    // and the owner type is default to individual user
+    try {
+      // getShortUserName will remove the realm part of the whole username
+      defaultUser = UserGroupInformation.getCurrentUser().getShortUserName();

Review Comment:
   Is this UserGroupInformation.getCurrentUser() always available?
   
   Is that different from System.getProperty("user.name")? Could you help me understand what is the benefit of using this instead? What I have in mind here is that in case UGI gives a different user than what we get now from the system then this would be a breaking change (except if we consider this a bug fix). And if it gives the same user then there is no point of this part of the change. I might miss something here, though.



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -422,11 +425,22 @@ private Table newHmsTable(TableMetadata metadata) {
     Preconditions.checkNotNull(metadata, "'metadata' parameter can't be null");
     final long currentTimeMillis = System.currentTimeMillis();
 
+    String defaultUser;
+    // default ownership is determined from Hadoop.UGI.currentUser
+    // and the owner type is default to individual user
+    try {
+      // getShortUserName will remove the realm part of the whole username
+      defaultUser = UserGroupInformation.getCurrentUser().getShortUserName();
+    } catch (IOException e) {
+      LOG.error("Unable to determine the current UGI user", e);
+      throw new RuntimeException(e);

Review Comment:
   Similarly to the DB code, in case of an exception, can't we fallback to the original behaviour?



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -437,6 +451,18 @@ private Table newHmsTable(TableMetadata metadata) {
             null,
             TableType.EXTERNAL_TABLE.toString());
 
+    // hive2 does not support setting table ownership type, therefore we leave it as
+    // a table property. Once upgraded to hive3, then we can simply set
+    // newTable.setOwnerType(PrincipalType.valueOf(metadata.get(HMS_TABLE_OWNER_TYPE)))
+    String ownerType = metadata.property(TableProperties.HMS_TABLE_OWNER_TYPE, null);
+    if (ownerType != null && ownerType.equalsIgnoreCase(PrincipalType.GROUP.name())) {
+      newTable
+          .getParameters()
+          .put(TableProperties.HMS_TABLE_OWNER_TYPE, PrincipalType.GROUP.name());

Review Comment:
   Cant you simply put ownerType instead of PrincipalType.GROUP.name()?



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -437,6 +451,18 @@ private Table newHmsTable(TableMetadata metadata) {
             null,
             TableType.EXTERNAL_TABLE.toString());
 
+    // hive2 does not support setting table ownership type, therefore we leave it as

Review Comment:
   I'm not really sure about this approach. If we can't add ownership type into HMS and have to use a table property for this purpose would the clients of Iceberg respect this table property or would it need extra implementation to get the ownership type from the property?



##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##########
@@ -358,6 +388,36 @@ public void testCreateNamespace() throws Exception {
         "There no same location for db and namespace", database2.getLocationUri(), hiveLocalDir);
   }
 
+  @Test
+  public void testCreateNamespaceWithOwnership() throws Exception {
+    Namespace namespace1 = Namespace.of("userOwnership");
+    Map<String, String> prop1 =
+        ImmutableMap.of(
+            TableProperties.HMS_DB_OWNER,
+            "apache",
+            TableProperties.HMS_DB_OWNER_TYPE,
+            PrincipalType.USER.name());
+    catalog.createNamespace(namespace1, prop1);
+    Database database1 = metastoreClient.getDatabase(namespace1.toString());
+
+    Assert.assertEquals("apache", database1.getOwnerName());
+    Assert.assertEquals(PrincipalType.USER, database1.getOwnerType());
+
+    Map<String, String> prop2 =
+        ImmutableMap.of(
+            TableProperties.HMS_DB_OWNER,
+            "iceberg",
+            TableProperties.HMS_DB_OWNER_TYPE,
+            PrincipalType.GROUP.name());
+    Namespace namespace2 = Namespace.of("groupOwnership");
+
+    catalog.createNamespace(namespace2, prop2);
+    Database database2 = metastoreClient.getDatabase(namespace2.toString());
+
+    Assert.assertEquals("iceberg", database2.getOwnerName());
+    Assert.assertEquals(PrincipalType.GROUP, database2.getOwnerType());
+  }

Review Comment:
   Can we also add a check to verify that PrincipalType.USER is the default when we don't provide the ownership type in parameters?



##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##########
@@ -270,9 +275,34 @@ public void testCreateTableWithOwner() throws Exception {
       Assert.assertEquals(owner, hmsTable.getOwner());
       Map<String, String> hmsTableParams = hmsTable.getParameters();
       Assert.assertFalse(hmsTableParams.containsKey(TableProperties.HMS_TABLE_OWNER));
+      Assert.assertEquals(
+          PrincipalType.USER.name(), hmsTableParams.get(TableProperties.HMS_TABLE_OWNER_TYPE));
     } finally {
       catalog.dropTable(tableIdent);
     }
+
+    TableIdentifier tableIdent2 = TableIdentifier.of(DB_NAME, "tbl_group_owned");

Review Comment:
   The below part of this test is almost identical to the part above except the table properties. Would it make sense to add a helper function that executes the test based on different properties that it gets as a parameter?



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -422,11 +425,22 @@ private Table newHmsTable(TableMetadata metadata) {
     Preconditions.checkNotNull(metadata, "'metadata' parameter can't be null");
     final long currentTimeMillis = System.currentTimeMillis();
 
+    String defaultUser;
+    // default ownership is determined from Hadoop.UGI.currentUser
+    // and the owner type is default to individual user
+    try {
+      // getShortUserName will remove the realm part of the whole username
+      defaultUser = UserGroupInformation.getCurrentUser().getShortUserName();

Review Comment:
   Similarly to the DB code, if there is the HMS_TABLE_OWNER property set then we shouldn't call this Hadoop function.



##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##########
@@ -270,9 +275,34 @@ public void testCreateTableWithOwner() throws Exception {
       Assert.assertEquals(owner, hmsTable.getOwner());
       Map<String, String> hmsTableParams = hmsTable.getParameters();
       Assert.assertFalse(hmsTableParams.containsKey(TableProperties.HMS_TABLE_OWNER));
+      Assert.assertEquals(
+          PrincipalType.USER.name(), hmsTableParams.get(TableProperties.HMS_TABLE_OWNER_TYPE));
     } finally {
       catalog.dropTable(tableIdent);
     }
+
+    TableIdentifier tableIdent2 = TableIdentifier.of(DB_NAME, "tbl_group_owned");
+    String location2 = temp.newFolder("tbl_group_owned").toString();
+    String owner2 = "some_group_owner";
+    ImmutableMap<String, String> properties2 =
+        ImmutableMap.of(
+            TableProperties.HMS_TABLE_OWNER,
+            owner2,
+            TableProperties.HMS_TABLE_OWNER_TYPE,
+            PrincipalType.GROUP.name());
+
+    try {
+      Table table2 = catalog.createTable(tableIdent2, schema, spec, location2, properties2);
+      org.apache.hadoop.hive.metastore.api.Table hmsTable =
+          metastoreClient.getTable(DB_NAME, "tbl_group_owned");
+      Assert.assertEquals(owner2, hmsTable.getOwner());
+      Map<String, String> hmsTableParams = hmsTable.getParameters();
+      Assert.assertFalse(hmsTableParams.containsKey(TableProperties.HMS_TABLE_OWNER));
+      Assert.assertEquals(
+          PrincipalType.GROUP.name(), hmsTableParams.get(TableProperties.HMS_TABLE_OWNER_TYPE));
+    } finally {
+      catalog.dropTable(tableIdent2);
+    }

Review Comment:
   I wonder if it would make sense to check the owner when no table property is given.



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -437,6 +451,18 @@ private Table newHmsTable(TableMetadata metadata) {
             null,
             TableType.EXTERNAL_TABLE.toString());
 
+    // hive2 does not support setting table ownership type, therefore we leave it as
+    // a table property. Once upgraded to hive3, then we can simply set
+    // newTable.setOwnerType(PrincipalType.valueOf(metadata.get(HMS_TABLE_OWNER_TYPE)))
+    String ownerType = metadata.property(TableProperties.HMS_TABLE_OWNER_TYPE, null);
+    if (ownerType != null && ownerType.equalsIgnoreCase(PrincipalType.GROUP.name())) {
+      newTable
+          .getParameters()
+          .put(TableProperties.HMS_TABLE_OWNER_TYPE, PrincipalType.GROUP.name());
+    } else { // default: individual ownership

Review Comment:
   I think instead of this else branch you could pass PrincipalType.USER.name() to metadata.property() instead of null as a second parameter in L457 that is in fact the default value when the first param is not a valid key.



##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##########
@@ -261,7 +262,11 @@ public void testCreateTableWithOwner() throws Exception {
     String location = temp.newFolder("tbl").toString();
     String owner = "some_owner";
     ImmutableMap<String, String> properties =
-        ImmutableMap.of(TableProperties.HMS_TABLE_OWNER, owner);
+        ImmutableMap.of(
+            TableProperties.HMS_TABLE_OWNER,
+            owner,
+            TableProperties.HMS_TABLE_OWNER_TYPE,

Review Comment:
   I'd also keep the test coverage when there is no HMS_TABLE_OWNER_TYPE is set.



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -532,13 +536,27 @@ Database convertToDatabase(Namespace namespace, Map<String, String> meta) {
 
     database.setName(namespace.level(0));
     database.setLocationUri(databaseLocation(namespace.level(0)));
+    // default ownership is determined from Hadoop.UGI.currentUser

Review Comment:
   As I see it's not just createNamespace() that calls this convertToDatabase() function but it's also used in HiveCatalog.setProperties() and removeProperties(). I'm no expert here but what if user-A calls createNamespace() and then user-B calls setProperties() (but not changing the DB owner) then won't we end up unintentionally changing the DB owner as a result of convertToDatabase() setting the owner during a setProperties() as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org