You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/03 15:55:37 UTC

[GitHub] [iceberg] dimas-b commented on a diff in pull request #4826: Nessie: Use unique path for different table with same name

dimas-b commented on code in PR #4826:
URL: https://github.com/apache/iceberg/pull/4826#discussion_r1013017903


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##########
@@ -199,10 +200,16 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
 
   @Override
   protected String defaultWarehouseLocation(TableIdentifier table) {
+    String location;
     if (table.hasNamespace()) {
-      return SLASH.join(warehouseLocation, table.namespace().toString(), table.name());
+      location = SLASH.join(warehouseLocation, table.namespace().toString(), table.name());
+    } else {
+      location = SLASH.join(warehouseLocation, table.name());
     }
-    return SLASH.join(warehouseLocation, table.name());
+    // Different tables with same table name can exist across references in Nessie.
+    // To avoid sharing same table path between two tables with same name, use uuid in the table
+    // path.
+    return location + "_" + UUID.randomUUID();

Review Comment:
   This will make `defaultWarehouseLocation`  return different values when called multiple times.
   
   IMHO, it would be preferable to make a slightly larger refactoring in `BaseMetastoreCatalog` and perhaps rename this method to `chooseNewWarehouseLocation(TableIdentifier table)` to make it clearer that the location is not 1:1 with table ID.



##########
nessie/src/test/java/org/apache/iceberg/nessie/TestBranchVisibility.java:
##########
@@ -484,4 +484,32 @@ public void testWithRefAndHash() throws NessieConflictException, NessieNotFoundE
     nessieCatalog.createTable(identifier2, schema);
     Assertions.assertThat(nessieCatalog.listTables(namespace)).hasSize(2);
   }
+
+  @Test
+  public void testDifferentTableSameName() throws NessieConflictException, NessieNotFoundException {
+    String branch1 = "branch1";
+    String branch2 = "branch2";
+    createBranch(branch1, null);
+    createBranch(branch2, null);
+    Schema schema1 =
+        new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields());
+    Schema schema2 =
+        new Schema(
+            Types.StructType.of(
+                    required(1, "file_count", Types.IntegerType.get()),
+                    required(2, "record_count", Types.LongType.get()))
+                .fields());
+
+    TableIdentifier identifier = TableIdentifier.of("db.", "table1");

Review Comment:
   Since the intent is obviously not testing special chars like `.` in tables IDs, I'd go with plain `"db"`.



##########
nessie/src/test/java/org/apache/iceberg/nessie/BaseTestIceberg.java:
##########
@@ -80,7 +80,7 @@ public abstract class BaseTestIceberg {
 
   private static final Logger LOG = LoggerFactory.getLogger(BaseTestIceberg.class);
 
-  @TempDir public Path temp;
+  @TempDir public File temp;

Review Comment:
   +1 to revert since this refactoring is not really needed to fix the basic problem in this PR. I'd suggest moving the refactoring to a separate PR (if people still want it).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org