You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/24 10:14:01 UTC

[GitHub] [iceberg] pvary opened a new pull request #1240: Default warehouse location of a table should be a subdirectory in database location

pvary opened a new pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240


   Pull request for issue for issue #954 
   
   - Fixes HiveCatalog.defaultWarehouseLocation method
   - Creates a test for it
   - A simple formatting fix
   - Moves an unnecessary SD conversion


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1240: Default warehouse location of a table should be a subdirectory in database location

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240#issuecomment-663605907


   Looks great, @pvary! Just a couple of minor things to fix. Thanks for contributing!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] XuQianJin-Stars commented on pull request #1240: Default warehouse location of a table should be a subdirectory in database location

Posted by GitBox <gi...@apache.org>.
XuQianJin-Stars commented on pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240#issuecomment-740014725


   > > If it is set to false here, there will be a situation where the hdfs file directory will not be deleted when the flink sql client is deleting the database. What is the reason for this setting to false?
   > 
   > @XuQianJin-Stars: This PR has not changed the behavior of dropping namespaces. The highlighted code was added with #919
   > 
   > Independently of this IMO the best place for these questions would be the dev list, but I guess that we wanted to be conservative about dropping a namespace to prevent accidentally removing wrong directories and with them important data.
   
   Thank you, I understand what you mean, do we need to expose the parameter option to the user, let the user decide whether to really delete the database? And the default option is not to delete data.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on a change in pull request #1240: Default warehouse location of a table should be a subdirectory in database location

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240#discussion_r460572363



##########
File path: hive/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -376,6 +376,28 @@ public TableOperations newTableOps(TableIdentifier tableIdentifier) {
 
   @Override
   protected String defaultWarehouseLocation(TableIdentifier tableIdentifier) {
+    // This is a little edgy since we basically duplicate the HMS location generation logic.
+    // Sadly I do not see a good way around this if we want to keep the order of events, like:
+    // - Create meta files
+    // - Create the metadata in HMS, and thous commit the changes

Review comment:
       Nit: Typo in `thous`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #1240: Default warehouse location of a table should be a subdirectory in database location

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240#discussion_r460738276



##########
File path: hive/src/test/java/org/apache/iceberg/hive/HiveTableBaseTest.java
##########
@@ -66,6 +66,7 @@ public void dropTestTable() throws Exception {
     tableLocation.getFileSystem(hiveConf).delete(tableLocation, true);
     catalog.dropTable(TABLE_IDENTIFIER, false /* metadata only, location was already deleted */);
   }
+

Review comment:
       Done

##########
File path: hive/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -376,6 +376,28 @@ public TableOperations newTableOps(TableIdentifier tableIdentifier) {
 
   @Override
   protected String defaultWarehouseLocation(TableIdentifier tableIdentifier) {
+    // This is a little edgy since we basically duplicate the HMS location generation logic.
+    // Sadly I do not see a good way around this if we want to keep the order of events, like:
+    // - Create meta files
+    // - Create the metadata in HMS, and thous commit the changes

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #1240: Default warehouse location of a table should be a subdirectory in database location

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240#discussion_r460738599



##########
File path: hive/src/test/java/org/apache/iceberg/hive/HiveTableTest.java
##########
@@ -295,4 +302,29 @@ public void testListTables() {
     Assert.assertEquals(1, expectedIdents.size());
     Assert.assertTrue(catalog.tableExists(TABLE_IDENTIFIER));
   }
+
+  @Test
+  public void testNonDefaultDatabaseLocation() throws IOException, TException {
+    // Create a new location and a non-default database / namespace for it
+    File nonDefaultLocation = createTempDirectory(NON_DEFAULT_DATABASE,
+        asFileAttribute(fromString("rwxrwxrwx"))).toFile();
+    Database database = new Database();
+
+    database.setName(NON_DEFAULT_DATABASE);
+    database.setLocationUri(nonDefaultLocation.getPath());
+    metastoreClient.createDatabase(database);

Review comment:
       Sure! Updated.
   Thanks for the review!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #1240: Default warehouse location of a table should be a subdirectory in database location

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240#issuecomment-740008622


   > If it is set to false here, there will be a situation where the hdfs file directory will not be deleted when the flink sql client is deleting the database. What is the reason for this setting to false?
   
   @XuQianJin-Stars: This PR has not changed the behavior of dropping namespaces. The highlighted code was added with #919
   
   Independently of this IMO the best place for these questions would be the dev list, but I guess that we wanted to be conservative about dropping a namespace to prevent accidentally removing wrong directories and with them important data.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1240: Default warehouse location of a table should be a subdirectory in database location

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240#discussion_r460137655



##########
File path: hive/src/test/java/org/apache/iceberg/hive/HiveTableBaseTest.java
##########
@@ -66,6 +66,7 @@ public void dropTestTable() throws Exception {
     tableLocation.getFileSystem(hiveConf).delete(tableLocation, true);
     catalog.dropTable(TABLE_IDENTIFIER, false /* metadata only, location was already deleted */);
   }
+

Review comment:
       Nit: this file doesn't need to change. Can you revert this to avoid git conflicts?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] XuQianJin-Stars commented on pull request #1240: Default warehouse location of a table should be a subdirectory in database location

Posted by GitBox <gi...@apache.org>.
XuQianJin-Stars commented on pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240#issuecomment-739895879


   hi @pvary 
   ![image](https://user-images.githubusercontent.com/10494131/101352197-bbca1900-38cc-11eb-9128-7c669bd50fdb.png)
   If it is set to false here, there will be a situation where the hdfs file directory will not be deleted when the flink sql client is deleting the database. What is the reason for this setting to false?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1240: Default warehouse location of a table should be a subdirectory in database location

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240#discussion_r460138461



##########
File path: hive/src/test/java/org/apache/iceberg/hive/HiveTableTest.java
##########
@@ -295,4 +302,29 @@ public void testListTables() {
     Assert.assertEquals(1, expectedIdents.size());
     Assert.assertTrue(catalog.tableExists(TABLE_IDENTIFIER));
   }
+
+  @Test
+  public void testNonDefaultDatabaseLocation() throws IOException, TException {
+    // Create a new location and a non-default database / namespace for it
+    File nonDefaultLocation = createTempDirectory(NON_DEFAULT_DATABASE,
+        asFileAttribute(fromString("rwxrwxrwx"))).toFile();
+    Database database = new Database();
+
+    database.setName(NON_DEFAULT_DATABASE);
+    database.setLocationUri(nonDefaultLocation.getPath());
+    metastoreClient.createDatabase(database);

Review comment:
       The catalog supports creating a namespace, so I think this test would be better if it used the catalog to create the namespace with a location ("location" property) instead of going through the metastore client directly. Could you update it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #1240: Default warehouse location of a table should be a subdirectory in database location

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #1240:
URL: https://github.com/apache/iceberg/pull/1240#issuecomment-665000943


   Thanks @rdblue, @shardulm94 for the review and the push!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org