You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/14 02:18:18 UTC

[GitHub] [iceberg] jackye1995 opened a new pull request, #6586: AWS: make warehouse path optional for read only catalog use cases

jackye1995 opened a new pull request, #6586:
URL: https://github.com/apache/iceberg/pull/6586

   Currently no matter in what situation warehouse path must be specified, but in many cases the user just want to initialize Glue catalog to read data, and don't want to pass in a warehouse path. This is to support that use case without user having to specify a dummy warehouse path to bypass the checks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #6586: AWS: make warehouse path optional for read only catalog use cases

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 merged PR #6586:
URL: https://github.com/apache/iceberg/pull/6586


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aajisaka commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

Posted by GitBox <gi...@apache.org>.
aajisaka commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082123777


##########
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java:
##########
@@ -132,6 +134,28 @@ public void testCreateTableBadName() {
                 TableIdentifier.of(namespace, "table-1"), schema, partitionSpec));
   }
 
+  @Test
+  public void testCreateAndLoadTableWithoutWarehouseLocation() {
+    GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog();
+    glueCatalogWithoutWarehouse.initialize(
+        catalogName,
+        null,
+        new AwsProperties(),
+        glue,
+        LockManagers.defaultLockManager(),
+        new S3FileIO(clientFactory::s3),
+        ImmutableMap.of());
+    String namespace = createNamespace();
+    String tableName = getRandomName();
+    TableIdentifier identifier = TableIdentifier.of(namespace, tableName);
+    try {
+      glueCatalog.createTable(identifier, schema, partitionSpec, tableLocationProperties);
+      glueCatalog.loadTable(identifier);
+    } catch (RuntimeException e) {
+      fail("Create and load table without warehouse location should succeed");

Review Comment:
   ```suggestion
         throw new IOException("Create and load table without warehouse location should succeed", e);
   ```
   We shouldn't hide the original exception. Without the exception message and the stacktrace, it's hard to find the root cause when an exception occur.



##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -177,13 +178,9 @@ void initialize(
       GlueClient client,
       LockManager lock,
       FileIO io) {
-    Preconditions.checkArgument(
-        path != null && path.length() > 0,
-        "Cannot initialize GlueCatalog because warehousePath must not be null or empty");
-
     this.catalogName = name;
     this.awsProperties = properties;
-    this.warehousePath = LocationUtil.stripTrailingSlash(path);
+    this.warehousePath = path != null ? LocationUtil.stripTrailingSlash(path) : null;

Review Comment:
   If the `path` is empty, it fails in `LocationUtil.stripTrailiingSlash(path)`
   ```suggestion
       this.warehousePath = (path != null && path.length() > 0) ? LocationUtil.stripTrailingSlash(path) : null;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082151825


##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -177,13 +178,9 @@ void initialize(
       GlueClient client,
       LockManager lock,
       FileIO io) {
-    Preconditions.checkArgument(
-        path != null && path.length() > 0,
-        "Cannot initialize GlueCatalog because warehousePath must not be null or empty");
-
     this.catalogName = name;
     this.awsProperties = properties;
-    this.warehousePath = LocationUtil.stripTrailingSlash(path);
+    this.warehousePath = path != null ? LocationUtil.stripTrailingSlash(path) : null;

Review Comment:
   good point, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082168946


##########
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java:
##########
@@ -132,6 +133,29 @@ public void testCreateTableBadName() {
                 TableIdentifier.of(namespace, "table-1"), schema, partitionSpec));
   }
 
+  @Test
+  public void testCreateAndLoadTableWithoutWarehouseLocation() {
+    GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog();
+    glueCatalogWithoutWarehouse.initialize(
+        catalogName,
+        null,
+        new AwsProperties(),
+        glue,
+        LockManagers.defaultLockManager(),
+        new S3FileIO(clientFactory::s3),
+        ImmutableMap.of());
+    String namespace = createNamespace();
+    String tableName = getRandomName();
+    TableIdentifier identifier = TableIdentifier.of(namespace, tableName);
+    try {

Review Comment:
   nit: is the try-catch really needed when we expect the test to succeed? Or is it more of a "give an additional hint when it fails"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082172027


##########
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java:
##########
@@ -132,6 +133,29 @@ public void testCreateTableBadName() {
                 TableIdentifier.of(namespace, "table-1"), schema, partitionSpec));
   }
 
+  @Test
+  public void testCreateAndLoadTableWithoutWarehouseLocation() {
+    GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog();
+    glueCatalogWithoutWarehouse.initialize(
+        catalogName,
+        null,
+        new AwsProperties(),
+        glue,
+        LockManagers.defaultLockManager(),
+        new S3FileIO(clientFactory::s3),
+        ImmutableMap.of());
+    String namespace = createNamespace();
+    String tableName = getRandomName();
+    TableIdentifier identifier = TableIdentifier.of(namespace, tableName);
+    try {

Review Comment:
   yes, just try to make the failure reason more explicit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #6586: AWS: make warehouse path optional for read only catalog use cases

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#issuecomment-1398781852

   Thanks for testing with Glue @aajisaka ! And thanks for the review @amogh-jahagirdar @nastra 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082152137


##########
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java:
##########
@@ -132,6 +134,28 @@ public void testCreateTableBadName() {
                 TableIdentifier.of(namespace, "table-1"), schema, partitionSpec));
   }
 
+  @Test
+  public void testCreateAndLoadTableWithoutWarehouseLocation() {
+    GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog();
+    glueCatalogWithoutWarehouse.initialize(
+        catalogName,
+        null,
+        new AwsProperties(),
+        glue,
+        LockManagers.defaultLockManager(),
+        new S3FileIO(clientFactory::s3),
+        ImmutableMap.of());
+    String namespace = createNamespace();
+    String tableName = getRandomName();
+    TableIdentifier identifier = TableIdentifier.of(namespace, tableName);
+    try {
+      glueCatalog.createTable(identifier, schema, partitionSpec, tableLocationProperties);
+      glueCatalog.loadTable(identifier);
+    } catch (RuntimeException e) {
+      fail("Create and load table without warehouse location should succeed");

Review Comment:
   yes, I was not able to find a `fail(...)` method that allows taking exception, thus this. Throwing IO exception seems too much, I changed the code to rethrow the runtime exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082174852


##########
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java:
##########
@@ -132,6 +133,29 @@ public void testCreateTableBadName() {
                 TableIdentifier.of(namespace, "table-1"), schema, partitionSpec));
   }
 
+  @Test
+  public void testCreateAndLoadTableWithoutWarehouseLocation() {
+    GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog();
+    glueCatalogWithoutWarehouse.initialize(
+        catalogName,
+        null,
+        new AwsProperties(),
+        glue,
+        LockManagers.defaultLockManager(),
+        new S3FileIO(clientFactory::s3),
+        ImmutableMap.of());
+    String namespace = createNamespace();
+    String tableName = getRandomName();
+    TableIdentifier identifier = TableIdentifier.of(namespace, tableName);
+    try {

Review Comment:
   thanks, that makes sense in this case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #6586: AWS: make warehouse path optional for read only catalog use cases

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#issuecomment-1397395046

   @aajisaka can you also take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org