You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "ajantha-bhat (via GitHub)" <gi...@apache.org> on 2023/03/02 05:25:37 UTC

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6789: Nessie: Add refresh for catalog APIs

ajantha-bhat opened a new pull request, #6789:
URL: https://github.com/apache/iceberg/pull/6789

   Consider the scenario,
   a) client1 - java Nessie catalog client creates a table1
   b) client2 - spark + iceberg+ Nessie creates a table2
   c) client3 - another java Nessie catalog client creates a table3
   
   where all the 3 clients are connected to the same Nessie server. 
   
   Now for the first client (client1), list tables show only one table even though all 3 clients are connected to the same Nessie server. 
   The reason is only the catalog APIs that involves table operations (like `loadTable()`, `commit()`, etc) refresh the `NessieIcebergClient`. The APIs like `listTables()`, `listNamespaces()` use the old state as it doesn't use table operations. 
   
   Hence, this PR adds a refresh to those APIs (all the APIs).
   
   Other scenarios can be two concurrent spark sessions.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1122533913


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##########
@@ -310,19 +312,28 @@ public Configuration getConf() {
 
   @VisibleForTesting
   String currentHash() {
-    return client.getRef().getHash();
+    return refreshedClient().getRef().getHash();
   }
 
   @VisibleForTesting
   String currentRefName() {
-    return client.getRef().getName();
+    return refreshedClient().getRef().getName();
   }
 
   @VisibleForTesting
   FileIO fileIO() {
     return fileIO;
   }
 
+  private NessieIcebergClient refreshedClient() {
+    try {
+      client.refresh();

Review Comment:
   >  Why not simply call client.refresh() where we need to
   
   Because of exception handling. I can't modify each method to throw `NessieNotFoundException` or catching `NessieNotFoundException` in each method is more lines of code. Hence, I extracted a method. 
   
   > why not call the method simply client()?
   
   ok. I will rename it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1501333776

   Thanks, @snazy and @dimas-b for the review. 
   @Fokko can you please help in merging this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1138176463


##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -71,6 +73,15 @@ public Reference getReference() {
     return reference;
   }
 
+  public Reference getReferenceForApiRequest() {

Review Comment:
   >  I suspect, there's a simpler way to just use the reference name, right?
   
   I changed to use just the reference name in the builder. But it contradicts with previous comment (handling things with less code changes) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] snazy commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "snazy (via GitHub)" <gi...@apache.org>.
snazy commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1434626097

   Sure this does not break the expected-hash calculations for commits?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat closed pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat closed pull request #6789: Nessie: Use latest hash for catalog APIs
URL: https://github.com/apache/iceberg/pull/6789


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1138603710


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##########
@@ -491,6 +519,15 @@ private boolean isSnapshotOperation(TableMetadata base, TableMetadata metadata)
             || snapshot.snapshotId() != base.currentSnapshot().snapshotId());
   }
 
+  private void addReference(OnReferenceBuilder<?> builder) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1138312883


##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -60,13 +60,6 @@ public String getHash() {
     return reference.getHash();
   }
 
-  public Branch getAsBranch() {

Review Comment:
   > This method a safe cast
   
   because all the places where this was called, already there is a check for `getRef().checkMutable()`. So, no need for one more method call that checks if it is a branch. Hence, directly casting to branch.
   
   I will remove `isBranch` too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dimas-b commented on a diff in pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1122340208


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##########
@@ -310,19 +312,28 @@ public Configuration getConf() {
 
   @VisibleForTesting
   String currentHash() {
-    return client.getRef().getHash();
+    return refreshedClient().getRef().getHash();
   }
 
   @VisibleForTesting
   String currentRefName() {
-    return client.getRef().getName();
+    return refreshedClient().getRef().getName();
   }
 
   @VisibleForTesting
   FileIO fileIO() {
     return fileIO;
   }
 
+  private NessieIcebergClient refreshedClient() {
+    try {
+      client.refresh();

Review Comment:
   Alternatively, since the code does not access the `client` field directly now (except for init and refresh), why not call the method simply `client()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1122543359


##########
nessie/src/test/java/org/apache/iceberg/nessie/TestMultipleClients.java:
##########
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.util.AbstractMap;
+import java.util.Collections;
+import org.apache.commons.io.FileUtils;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.types.Types;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+
+public class TestMultipleClients extends BaseTestIceberg {
+
+  private static final String BRANCH = "multiple-clients-test";
+  private static final Schema schema =
+      new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields());
+
+  public TestMultipleClients() {
+    super(BRANCH);
+  }
+
+  // another client that connects to the same nessie server.
+  NessieCatalog anotherCatalog;
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri)
+      throws IOException {
+    super.beforeEach(clientFactory, nessieUri);
+    anotherCatalog = initCatalog(branch);
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+    dropTables(catalog);
+    dropTables(anotherCatalog);

Review Comment:
   I followed `TestNessieTable` which was cleaning the table files located in a temp dir.
   Other test cases doesn't have it. 
   
   Agree that this should be at base test class level. I will handle it in a follow up PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dimas-b commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1438592791

   > [...]  In that case, txn can only commit one successful commit?
   
   The Nessie server is able to resolve non-conflicting changes to the catalog, even when the expected hash is behind the current branch HEAD.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1466336822

   one of the build failed due to #7023 
   
   re-triggering the build. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat closed pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat closed pull request #6789: Nessie: Use latest hash for catalog APIs
URL: https://github.com/apache/iceberg/pull/6789


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1496979144

   I think the PR was ready.  Please recheck. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] snazy commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "snazy (via GitHub)" <gi...@apache.org>.
snazy commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1451652760

   While the code looks technically correct, this change introduces an unnecessary extra round-trip. Usually, people work on the HEAD of a Nessie _branch_, which, for most Nessie operations, does _not_ require the client to know the HEAD's commit-ID and pass it back to the server.
   
   IMO the `NessieIcebergClient` needs to be able to handle the case of referencing the HEAD of a branch for `NessieCatalog` (and only pass the branch/tag name to the Nessie server), but be able provide an instance of `NessieIcebergClient` _with_ the expected hash / commit ID for `newTableOps`, `renameTable`, `dropTable`, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] snazy commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "snazy (via GitHub)" <gi...@apache.org>.
snazy commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1138293161


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##########
@@ -307,12 +320,18 @@ namespace, getRef().getName()),
 
   public void renameTable(TableIdentifier from, TableIdentifier to) {
     getRef().checkMutable();
+    try {
+      getRef().refresh(api);
+    } catch (NessieNotFoundException e) {
+      throw new RuntimeException(e);
+    }
 
-    IcebergTable existingFromTable = table(from);
+    UpdateableReference updateableReference = getRef();
+    IcebergTable existingFromTable = table(from, updateableReference.getReference());

Review Comment:
   What is the point of allowing an explicit hash here and below? Shouldn't it fail, if an explicit hash is provided?



##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -60,13 +60,6 @@ public String getHash() {
     return reference.getHash();
   }
 
-  public Branch getAsBranch() {

Review Comment:
   > I deprecated it instead of removing it because it is a public method.
   
   This method a safe cast - you replaced it (not clear for which reason) with an unconditional cast (and BTW left another then unused method).



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##########
@@ -491,6 +519,15 @@ private boolean isSnapshotOperation(TableMetadata base, TableMetadata metadata)
             || snapshot.snapshotId() != base.currentSnapshot().snapshotId());
   }
 
+  private void addReference(OnReferenceBuilder<?> builder) {

Review Comment:
   Call sites would be be simpler, if this would return the properly typed builder back.



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##########
@@ -374,8 +393,15 @@ public void renameTable(TableIdentifier from, TableIdentifier to) {
 
   public boolean dropTable(TableIdentifier identifier, boolean purge) {
     getRef().checkMutable();
+    try {
+      getRef().refresh(api);
+    } catch (NessieNotFoundException e) {
+      throw new RuntimeException(e);
+    }
+
+    UpdateableReference updateableReference = getRef();
 
-    IcebergTable existingTable = table(identifier);
+    IcebergTable existingTable = table(identifier, updateableReference.getReference());

Review Comment:
   Same comments as for renameTable() 



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##########
@@ -165,10 +166,10 @@ private TableIdentifier toIdentifier(EntriesResponse.Entry entry) {
     return TableIdentifier.of(elements.toArray(new String[elements.size()]));
   }
 
-  public IcebergTable table(TableIdentifier tableIdentifier) {
+  public IcebergTable table(TableIdentifier tableIdentifier, Reference ref) {

Review Comment:
   What's the real world scenario then wrt user interaction?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dimas-b commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1161830915


##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -60,13 +60,6 @@ public String getHash() {
     return reference.getHash();
   }
 
-  public Branch getAsBranch() {

Review Comment:
   nit: how about returning a `Branch` from `checkMutable` to avoid type casts in the calling code? (just a thought, no need to delay this PR because of this).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dimas-b commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1161830915


##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -60,13 +60,6 @@ public String getHash() {
     return reference.getHash();
   }
 
-  public Branch getAsBranch() {

Review Comment:
   nit: how about returning a `Branch` from `checkMutable` to avoid type casts in the calling code? (just a though, no need to delay this PR because of this).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dimas-b commented on a diff in pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1122266599


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##########
@@ -194,7 +195,7 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
         ContentKey.of(
             org.projectnessie.model.Namespace.of(tableIdentifier.namespace().levels()),
             tr.getName()),
-        client.withReference(tr.getReference(), tr.getHash()),
+        refreshedClient().withReference(tr.getReference(), tr.getHash()),

Review Comment:
   I believe in this case, since the client goes into `NessieTableOperations`, its reference should be loaded immediately and made immutable regardless of whether the commit hash is specified or not. Currently, the reference will be loaded on first use, which may happen a lot later. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Add refresh for catalog APIs that doesn't use table operations

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1434470512

   Restarting the build.
   
   ```
   Execution failed for task ':iceberg-spark:iceberg-spark-3.3_2.12:checkstyleMain'.
   See https://docs.gradle.org/7.6/userguide/command_line_interface.html#sec:command_line_warnings
   > A failure occurred while executing org.gradle.api.plugins.quality.internal.CheckstyleAction
   499 actionable tasks: 499 executed
      > An unexpected error occurred configuring and executing Checkstyle.
         > java.lang.Error: Error was thrown while processing /home/runner/work/iceberg/iceberg/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Handle refresh for catalog APIs that doesn't use table operations

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1101750610


##########
nessie/src/test/java/org/apache/iceberg/nessie/TestMultipleClients.java:
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import org.apache.commons.io.FileUtils;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+
+public class TestMultipleClients extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-table-test";
+  private static final Schema schema =
+      new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields());
+
+  public TestMultipleClients() {
+    super(BRANCH);
+  }
+
+  // another client that connects to the same nessie server.
+  NessieCatalog anotherCatalog;
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri)
+      throws IOException {
+    super.beforeEach(clientFactory, nessieUri);
+    anotherCatalog = initCatalog(branch);
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+    dropTables(catalog);
+    dropTables(anotherCatalog);
+
+    super.afterEach();
+    anotherCatalog.close();
+  }
+
+  @Test
+  public void testListTables() {
+    createTable(catalog, TableIdentifier.parse("foo.tbl1"));
+    Assertions.assertThat(catalog.listTables(Namespace.of("foo")))
+        .containsExactlyInAnyOrder(TableIdentifier.parse("foo.tbl1"));
+
+    // another client create tables with the same nessie server
+    createTable(anotherCatalog, TableIdentifier.parse("foo.tbl2"));
+    Assertions.assertThat(anotherCatalog.listTables(Namespace.of("foo")))
+        .containsExactlyInAnyOrder(
+            TableIdentifier.parse("foo.tbl1"), TableIdentifier.parse("foo.tbl2"));
+
+    Assertions.assertThat(catalog.listTables(Namespace.of("foo")))
+        .containsExactlyInAnyOrder(

Review Comment:
   This particular check fails without this PR. As, `catalog` refers to the old state for `listTables()`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1138604454


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##########
@@ -307,12 +320,18 @@ namespace, getRef().getName()),
 
   public void renameTable(TableIdentifier from, TableIdentifier to) {
     getRef().checkMutable();
+    try {
+      getRef().refresh(api);
+    } catch (NessieNotFoundException e) {
+      throw new RuntimeException(e);
+    }
 
-    IcebergTable existingFromTable = table(from);
+    UpdateableReference updateableReference = getRef();
+    IcebergTable existingFromTable = table(from, updateableReference.getReference());

Review Comment:
   I get it now. rename and drop table doesn't have to worry about table state. Reverted it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat closed pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat closed pull request #6789: Nessie: Add refresh for catalog APIs
URL: https://github.com/apache/iceberg/pull/6789


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dimas-b commented on a diff in pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1122435678


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##########
@@ -194,7 +195,7 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
         ContentKey.of(
             org.projectnessie.model.Namespace.of(tableIdentifier.namespace().levels()),
             tr.getName()),
-        client.withReference(tr.getReference(), tr.getHash()),
+        refreshedClient().withReference(tr.getReference(), tr.getHash()),

Review Comment:
   Actually, I think I was confused. Current code LGTM in this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Add refresh for catalog APIs that doesn't use table operations

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1434549025

   cc: @snazy, @dimas-b, @nastra   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1436494627

   > I think so. This issue is not specific to this PR. I just think this PR makes it more likely to manifest.
   Would it be possible to keep the "expected" hash at the Tx level?
   
   @dimas-b : In that case, txn can only commit one successful commit?
   
   > Sure this does not break the expected-hash calculations for commits?
   
   @snazy: can you please elaborate on this? Without this PR also, concurrent commits from the same client can refresh the catalog and cause breakage of expected-hash. 
   
   a) Are you suggesting that this PR will increase the probability of that event happening and we don't want to do that?  
   b) Even the sequential updates have issues mentioned in the PR description and I want to fix them. Should I keep the changes only for `listTables(), listNamespaces(), loadNamespaceMetadata(), setProperties(), removeProperties()`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1443230335

   @snazy & @dimas-b: 
   
   ```
   BaseTransaction.commitTransaction()
   -> BaseMetastoreTableOperations.commit() with refresh client 
   -> NessieTableOperations.doCommit() 
   -> NessieIcebergClient.commitTable with the expected hash. 
   ```
    
    Based on my analysis, the Nessie commit operation flow is as above. There are chances that we use older commit hash as the expected hash with the latest client. For this, Nessie backend server will fail the commit and Iceberg will retry the transaction/commit using this code[1] by rebasing the metadata and client.
    
    [1] 
   https://github.com/apache/iceberg/blob/3efaee1790fe541d0969e12fa24afa4e1a3041c3/core/src/main/java/org/apache/iceberg/BaseTransaction.java#L354-L359
   
   `So, Even if we add some code to update/refresh the expected hash to be the latest. It is not enough as TableMetadata will hold the old state. So, Nessie client cannot modify table metadata. The existing retry mechanism will handle this refresh. 
   `
   
   So, I don't think we need to change anything. Please clarify or guide me if you think it is incorrect. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat closed pull request #6789: Nessie: Add refresh for catalog APIs that doesn't use table operations

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat closed pull request #6789: Nessie: Add refresh for catalog APIs that doesn't use table operations
URL: https://github.com/apache/iceberg/pull/6789


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1434697865

   > Sure this does not break the expected-hash calculations for commits?
   
   I just added a testcase for commit from multiple catalogs on same branch + same table. It passes. 
   So, I think there is no impact. 
   Also, expected hash is passed after the refresh operation from NessieCatalog. So, the client won't be modified after passing expected hash in the same flow.
   
   @snazy: Do you see any problem with the current changes? Do you feel we should only modify `listTables, listNamespaces, listNamespaceMetadata, removeProperties,  setProperties` instead of refreshing all the APIs? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1451190748

   @dimas-b: Thanks for the review. I have replied to the comments. Please check. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1446045524

   @snazy, @dimas-b: 
   I checked some more code today.
   **I found that `NessieTableOperation` works on a copy of the client. Not the actual client from `NessieCatalog`. as `withReference` creates a new client object.**
   https://github.com/apache/iceberg/blob/master/nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java#L197
   
   And the client inside the `NessieTableOperation` will never be refreshed/out of state. There is also this check.
   https://github.com/apache/iceberg/blob/cb612ed142b600a7a4c8bb756fb71ffa5c1a0909/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L121
   
   TLDR;
   PR changes are for the client of `NessieCatalog`, not for the client of `NessieTableOperations`. Hence, I don't think there is any impact on the expected hash calculation of the commits of `NessieTableOperations`.
   
   Please recheck the PR again and let me know if a change required on this.
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1138154200


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##########
@@ -165,10 +166,10 @@ private TableIdentifier toIdentifier(EntriesResponse.Entry entry) {
     return TableIdentifier.of(elements.toArray(new String[elements.size()]));
   }
 
-  public IcebergTable table(TableIdentifier tableIdentifier) {
+  public IcebergTable table(TableIdentifier tableIdentifier, Reference ref) {

Review Comment:
   As mentioned, I changed `renameTable` and `dropTable` (it was not using TableOperation) to be similar to `commitTable`. 
   
   If I don't pass the reference, calling `getRef()` may give refreshed reference (maybe refreshed by concurrent commit).  So, to maintain the same state of reference for a whole operation, I am holding the state and passing it. 
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1138155762


##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -71,6 +73,15 @@ public Reference getReference() {
     return reference;
   }
 
+  public Reference getReferenceForApiRequest() {

Review Comment:
   True that we can just pass the reference name for this request. But in some cases, caller has to pass the hash (when use reference + hash is set). 
   
   So, to avoid having a check in all the callers (whether need to pass hash or not). I am passing like this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1138176463


##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -71,6 +73,15 @@ public Reference getReference() {
     return reference;
   }
 
+  public Reference getReferenceForApiRequest() {

Review Comment:
   >  I suspect, there's a simpler way to just use the reference name, right?
   
   I changed to use just the reference name in the builder now. But it contradicts with previous comment (handling things with less code changes) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1434943296

   Isn't now also (without this PR) if two concurrent commits happen for a same table, the client can get refreshed between the 
   caller of this
   https://github.com/apache/iceberg/blob/bb95e8bb1a9c283b86afeae978d2f838456c5d03/nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java#L423 
   to 
   https://github.com/apache/iceberg/blob/bb95e8bb1a9c283b86afeae978d2f838456c5d03/nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java#L430
   
   because of existing refresh in this code by concurrent operations?
   https://github.com/apache/iceberg/blob/bb95e8bb1a9c283b86afeae978d2f838456c5d03/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L119
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dimas-b commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1435042523

   > Isn't now also (without this PR) if two concurrent commits happen for a same table, the client can get refreshed [...]
   I think so. This issue is not specific to this PR. I just think this PR makes it more likely to manifest.
   
   Would it be possible to keep the "expected" hash at the Tx level?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1451320449

   This flaky error again !
   
   ```
   * What went wrong:
   Execution failed for task ':iceberg-spark:iceberg-spark-3.3_2.12:checkstyleMain'.
   > A failure occurred while executing org.gradle.api.plugins.quality.internal.CheckstyleAction
      > An unexpected error occurred configuring and executing Checkstyle.
         > java.lang.Error: Error was thrown while processing /home/runner/work/iceberg/iceberg/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat closed pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat closed pull request #6789: Nessie: Add refresh for catalog APIs
URL: https://github.com/apache/iceberg/pull/6789


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dimas-b commented on a diff in pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1122203463


##########
nessie/src/test/java/org/apache/iceberg/nessie/TestMultipleClients.java:
##########
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.util.AbstractMap;
+import java.util.Collections;
+import org.apache.commons.io.FileUtils;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.types.Types;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+
+public class TestMultipleClients extends BaseTestIceberg {
+
+  private static final String BRANCH = "multiple-clients-test";
+  private static final Schema schema =
+      new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields());
+
+  public TestMultipleClients() {
+    super(BRANCH);
+  }
+
+  // another client that connects to the same nessie server.
+  NessieCatalog anotherCatalog;
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri)
+      throws IOException {
+    super.beforeEach(clientFactory, nessieUri);
+    anotherCatalog = initCatalog(branch);
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+    dropTables(catalog);
+    dropTables(anotherCatalog);

Review Comment:
   Why bother? The super class will delete/reset all Nessie data. Table files are located in a temp. dir.
   
   If the intention is to remove table's files, wouldn't it be preferable to do that in the base class and the FS level for all tests after resetting Nessie?



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##########
@@ -194,7 +195,7 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
         ContentKey.of(
             org.projectnessie.model.Namespace.of(tableIdentifier.namespace().levels()),
             tr.getName()),
-        client.withReference(tr.getReference(), tr.getHash()),
+        refreshedClient().withReference(tr.getReference(), tr.getHash()),

Review Comment:
   I believe in this case, since the client goes into `NessieTableOperations`, it's reference should be loaded immediately and made immutable regardless of whether the commit hash is specified or not. Currently, the reference will be loaded on first use, which may happen a lot later. WDYT?



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##########
@@ -310,19 +312,28 @@ public Configuration getConf() {
 
   @VisibleForTesting
   String currentHash() {
-    return client.getRef().getHash();
+    return refreshedClient().getRef().getHash();
   }
 
   @VisibleForTesting
   String currentRefName() {
-    return client.getRef().getName();
+    return refreshedClient().getRef().getName();
   }
 
   @VisibleForTesting
   FileIO fileIO() {
     return fileIO;
   }
 
+  private NessieIcebergClient refreshedClient() {
+    try {
+      client.refresh();

Review Comment:
   Since this method does not return a new client instance (compared to, e.g., `NessieIcebergClient.withReference()`), using it to get the client for further actions looks a bit misleading to me. Why not simply call `client.refresh()` where we need to?



##########
nessie/src/test/java/org/apache/iceberg/nessie/TestMultipleClients.java:
##########
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.util.AbstractMap;
+import java.util.Collections;
+import org.apache.commons.io.FileUtils;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.types.Types;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+
+public class TestMultipleClients extends BaseTestIceberg {
+
+  private static final String BRANCH = "multiple-clients-test";
+  private static final Schema schema =
+      new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields());
+
+  public TestMultipleClients() {
+    super(BRANCH);
+  }
+
+  // another client that connects to the same nessie server.
+  NessieCatalog anotherCatalog;
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri)
+      throws IOException {
+    super.beforeEach(clientFactory, nessieUri);
+    anotherCatalog = initCatalog(branch);
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+    dropTables(catalog);
+    dropTables(anotherCatalog);
+
+    super.afterEach();
+    anotherCatalog.close();
+  }
+
+  @Test
+  public void testListNamespaces() {
+    catalog.createNamespace(Namespace.of("db1"), Collections.emptyMap());
+    Assertions.assertThat(catalog.listNamespaces()).containsExactlyInAnyOrder(Namespace.of("db1"));
+
+    // another client creates a namespace with the same nessie server
+    anotherCatalog.createNamespace(Namespace.of("db2"), Collections.emptyMap());
+    Assertions.assertThat(anotherCatalog.listNamespaces())
+        .containsExactlyInAnyOrder(Namespace.of("db1"), Namespace.of("db2"));
+
+    Assertions.assertThat(catalog.listNamespaces())
+        .containsExactlyInAnyOrder(Namespace.of("db1"), Namespace.of("db2"));
+  }
+
+  @Test
+  public void testLoadNamespaceMetadata() {
+    catalog.createNamespace(Namespace.of("namespace1"), Collections.emptyMap());
+    Assertions.assertThat(catalog.listNamespaces())
+        .containsExactlyInAnyOrder(Namespace.of("namespace1"));
+
+    // another client adds a metadata to the same namespace
+    anotherCatalog.setProperties(Namespace.of("namespace1"), Collections.singletonMap("k1", "v1"));
+    AbstractMap.SimpleEntry<String, String> entry = new AbstractMap.SimpleEntry<>("k1", "v1");
+    Assertions.assertThat(anotherCatalog.loadNamespaceMetadata(Namespace.of("namespace1")))
+        .containsExactly(entry);
+
+    Assertions.assertThat(catalog.loadNamespaceMetadata(Namespace.of("namespace1")))
+        .containsExactly(entry);
+  }
+
+  @Test
+  public void testListTables() {
+    catalog.createTable(TableIdentifier.parse("foo.tbl1"), schema);
+    Assertions.assertThat(catalog.listTables(Namespace.of("foo")))
+        .containsExactlyInAnyOrder(TableIdentifier.parse("foo.tbl1"));
+
+    // another client creates a table with the same nessie server
+    anotherCatalog.createTable(TableIdentifier.parse("foo.tbl2"), schema);
+    Assertions.assertThat(anotherCatalog.listTables(Namespace.of("foo")))
+        .containsExactlyInAnyOrder(
+            TableIdentifier.parse("foo.tbl1"), TableIdentifier.parse("foo.tbl2"));
+
+    Assertions.assertThat(catalog.listTables(Namespace.of("foo")))
+        .containsExactlyInAnyOrder(
+            TableIdentifier.parse("foo.tbl1"), TableIdentifier.parse("foo.tbl2"));
+  }
+
+  @Test
+  public void testCommits() {
+    TableIdentifier identifier = TableIdentifier.parse("foo.tbl1");
+    catalog.createTable(identifier, schema);
+    Table tableFromCatalog = catalog.loadTable(identifier);
+    tableFromCatalog.updateSchema().addColumn("x1", Types.LongType.get()).commit();
+
+    Table tableFromAnotherCatalog = anotherCatalog.loadTable(identifier);
+    tableFromAnotherCatalog.updateSchema().addColumn("x2", Types.LongType.get()).commit();
+
+    tableFromCatalog.updateSchema().addColumn("x3", Types.LongType.get()).commit();
+    tableFromAnotherCatalog.updateSchema().addColumn("x4", Types.LongType.get()).commit();
+
+    Assertions.assertThat(catalog.loadTable(identifier).schema().columns()).hasSize(5);
+    Assertions.assertThat(anotherCatalog.loadTable(identifier).schema().columns()).hasSize(5);
+  }
+
+  @Test
+  public void testConcurrentCommitsWithRefresh() {
+    TableIdentifier identifier = TableIdentifier.parse("foo.tbl1");
+    catalog.createTable(identifier, schema);
+
+    String hashBefore = catalog.currentHash();
+
+    TableOperations ops1 = catalog.newTableOps(identifier);
+    TableMetadata current1 =

Review Comment:
   nit: `current1` -> `metadata1`



##########
nessie/src/test/java/org/apache/iceberg/nessie/TestMultipleClients.java:
##########
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.util.AbstractMap;
+import java.util.Collections;
+import org.apache.commons.io.FileUtils;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.types.Types;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+
+public class TestMultipleClients extends BaseTestIceberg {
+
+  private static final String BRANCH = "multiple-clients-test";
+  private static final Schema schema =
+      new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields());
+
+  public TestMultipleClients() {
+    super(BRANCH);
+  }
+
+  // another client that connects to the same nessie server.
+  NessieCatalog anotherCatalog;
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri)
+      throws IOException {
+    super.beforeEach(clientFactory, nessieUri);
+    anotherCatalog = initCatalog(branch);
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+    dropTables(catalog);
+    dropTables(anotherCatalog);
+
+    super.afterEach();

Review Comment:
   Why override `afterEach`? This test class can have its own `@AfterEach` method for closing the catalog.



##########
nessie/src/test/java/org/apache/iceberg/nessie/TestMultipleClients.java:
##########
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.util.AbstractMap;
+import java.util.Collections;
+import org.apache.commons.io.FileUtils;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.types.Types;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+
+public class TestMultipleClients extends BaseTestIceberg {
+
+  private static final String BRANCH = "multiple-clients-test";
+  private static final Schema schema =
+      new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields());
+
+  public TestMultipleClients() {
+    super(BRANCH);
+  }
+
+  // another client that connects to the same nessie server.
+  NessieCatalog anotherCatalog;
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri)
+      throws IOException {
+    super.beforeEach(clientFactory, nessieUri);
+    anotherCatalog = initCatalog(branch);
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+    dropTables(catalog);
+    dropTables(anotherCatalog);
+
+    super.afterEach();
+    anotherCatalog.close();
+  }
+
+  @Test
+  public void testListNamespaces() {
+    catalog.createNamespace(Namespace.of("db1"), Collections.emptyMap());
+    Assertions.assertThat(catalog.listNamespaces()).containsExactlyInAnyOrder(Namespace.of("db1"));
+
+    // another client creates a namespace with the same nessie server
+    anotherCatalog.createNamespace(Namespace.of("db2"), Collections.emptyMap());
+    Assertions.assertThat(anotherCatalog.listNamespaces())
+        .containsExactlyInAnyOrder(Namespace.of("db1"), Namespace.of("db2"));
+
+    Assertions.assertThat(catalog.listNamespaces())
+        .containsExactlyInAnyOrder(Namespace.of("db1"), Namespace.of("db2"));
+  }
+
+  @Test
+  public void testLoadNamespaceMetadata() {
+    catalog.createNamespace(Namespace.of("namespace1"), Collections.emptyMap());
+    Assertions.assertThat(catalog.listNamespaces())
+        .containsExactlyInAnyOrder(Namespace.of("namespace1"));
+
+    // another client adds a metadata to the same namespace
+    anotherCatalog.setProperties(Namespace.of("namespace1"), Collections.singletonMap("k1", "v1"));
+    AbstractMap.SimpleEntry<String, String> entry = new AbstractMap.SimpleEntry<>("k1", "v1");
+    Assertions.assertThat(anotherCatalog.loadNamespaceMetadata(Namespace.of("namespace1")))
+        .containsExactly(entry);
+
+    Assertions.assertThat(catalog.loadNamespaceMetadata(Namespace.of("namespace1")))
+        .containsExactly(entry);
+  }
+
+  @Test
+  public void testListTables() {
+    catalog.createTable(TableIdentifier.parse("foo.tbl1"), schema);
+    Assertions.assertThat(catalog.listTables(Namespace.of("foo")))
+        .containsExactlyInAnyOrder(TableIdentifier.parse("foo.tbl1"));
+
+    // another client creates a table with the same nessie server
+    anotherCatalog.createTable(TableIdentifier.parse("foo.tbl2"), schema);
+    Assertions.assertThat(anotherCatalog.listTables(Namespace.of("foo")))
+        .containsExactlyInAnyOrder(
+            TableIdentifier.parse("foo.tbl1"), TableIdentifier.parse("foo.tbl2"));
+
+    Assertions.assertThat(catalog.listTables(Namespace.of("foo")))
+        .containsExactlyInAnyOrder(
+            TableIdentifier.parse("foo.tbl1"), TableIdentifier.parse("foo.tbl2"));
+  }
+
+  @Test
+  public void testCommits() {
+    TableIdentifier identifier = TableIdentifier.parse("foo.tbl1");
+    catalog.createTable(identifier, schema);
+    Table tableFromCatalog = catalog.loadTable(identifier);
+    tableFromCatalog.updateSchema().addColumn("x1", Types.LongType.get()).commit();
+
+    Table tableFromAnotherCatalog = anotherCatalog.loadTable(identifier);
+    tableFromAnotherCatalog.updateSchema().addColumn("x2", Types.LongType.get()).commit();
+
+    tableFromCatalog.updateSchema().addColumn("x3", Types.LongType.get()).commit();
+    tableFromAnotherCatalog.updateSchema().addColumn("x4", Types.LongType.get()).commit();
+
+    Assertions.assertThat(catalog.loadTable(identifier).schema().columns()).hasSize(5);
+    Assertions.assertThat(anotherCatalog.loadTable(identifier).schema().columns()).hasSize(5);
+  }
+
+  @Test
+  public void testConcurrentCommitsWithRefresh() {
+    TableIdentifier identifier = TableIdentifier.parse("foo.tbl1");
+    catalog.createTable(identifier, schema);
+
+    String hashBefore = catalog.currentHash();
+
+    TableOperations ops1 = catalog.newTableOps(identifier);
+    TableMetadata current1 =
+        TableMetadata.buildFrom(ops1.current()).setProperties(ImmutableMap.of("k1", "v1")).build();
+
+    // commit should succeed
+    TableOperations ops2 = catalog.newTableOps(identifier);
+    TableMetadata current2 =

Review Comment:
   nit: `current2` -> `metadata2`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1465909953

   >  While the code looks technically correct, this change introduces an unnecessary extra round-trip. Usually, people work on the HEAD of a Nessie branch, which, for most Nessie operations, does not require the client to know the HEAD's commit-ID and pass it back to the server.
   IMO the NessieIcebergClient needs to be able to handle the case of referencing the HEAD of a branch for NessieCatalog (and only pass the branch/tag name to the Nessie server), but be able provide an instance of NessieIcebergClient with the expected hash / commit ID for newTableOps, renameTable, dropTable, etc.
   
   @snazy, @dimas-b, @nastra: I have avoided the unnecessary extra round-trip by fixing it differently than before. Please take a look. Thanks. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dimas-b commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1438585925

   +1 to fixing how Nessie Catalog (and related classes) track expected hash and refresh references.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1138149788


##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -60,6 +60,8 @@ public String getHash() {
     return reference.getHash();
   }
 
+  /** @deprecated will be removed in 1.3.0 */
+  @Deprecated
   public Branch getAsBranch() {

Review Comment:
   because all the places where this was called, already there is a check for `getRef().checkMutable()`.  So, no need for one more method calls that check if it is a branch. Hence, directly casting to branch. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1138175573


##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -60,6 +60,8 @@ public String getHash() {
     return reference.getHash();
   }
 
+  /** @deprecated will be removed in 1.3.0 */
+  @Deprecated

Review Comment:
   I deprecated it instead of removing it because it is a public method. 
   Now, I realise that the class is not public. I removed this API now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] snazy commented on pull request #6789: Nessie: Add refresh for catalog APIs

Posted by "snazy (via GitHub)" <gi...@apache.org>.
snazy commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1436901624

   I'm suggesting to provide a fix for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Add refresh for catalog APIs that doesn't use table operations

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1109495367


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##########
@@ -223,13 +224,13 @@ protected String defaultWarehouseLocation(TableIdentifier table) {
 
   @Override
   public List<TableIdentifier> listTables(Namespace namespace) {
-    return client.listTables(namespace);
+    return refreshedClient().listTables(namespace);

Review Comment:
   refreshing the client is enough for a few APIs that doesn't use `TableOperations`. But the code looks odd. So, I am refreshing all the APIs as it is a lightweight operation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1466217487

   > This change adds too many unnecessary code changes. Most of the changes aren't justified, the required fix can likely be achieved with way less changes. The current PR also doesn't clean up renameTable wrt the scope of this PR.
   
   @snazy: I gave it another try. Please take a look. Also the `renameTable` and `dropTable` was not going via `TableOperations`. So, I have handled it similarly to the existing `commitTable` logic now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#issuecomment-1471408842

   one of the builds failed due to https://github.com/apache/iceberg/issues/7023
   
   https://github.com/apache/iceberg/actions/runs/4434173539/jobs/7779904892
   
   re-triggering the build.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] snazy commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "snazy (via GitHub)" <gi...@apache.org>.
snazy commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1137051869


##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -60,6 +60,8 @@ public String getHash() {
     return reference.getHash();
   }
 
+  /** @deprecated will be removed in 1.3.0 */
+  @Deprecated

Review Comment:
   What's calling this one?



##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -60,6 +60,8 @@ public String getHash() {
     return reference.getHash();
   }
 
+  /** @deprecated will be removed in 1.3.0 */
+  @Deprecated
   public Branch getAsBranch() {

Review Comment:
   Don't understand why the existing call sites are changed.



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##########
@@ -165,10 +166,10 @@ private TableIdentifier toIdentifier(EntriesResponse.Entry entry) {
     return TableIdentifier.of(elements.toArray(new String[elements.size()]));
   }
 
-  public IcebergTable table(TableIdentifier tableIdentifier) {
+  public IcebergTable table(TableIdentifier tableIdentifier, Reference ref) {

Review Comment:
   This signature change with all the related changes makes the call sides more complicated - hard to see a justification for that.  



##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -71,6 +73,15 @@ public Reference getReference() {
     return reference;
   }
 
+  public Reference getReferenceForApiRequest() {

Review Comment:
   What's the point of returning a `Branch` object as a `Reference` if the hash's always `null`? I suspect, there's a simpler way to just use the reference name, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6789:
URL: https://github.com/apache/iceberg/pull/6789#discussion_r1162307749


##########
nessie/src/main/java/org/apache/iceberg/nessie/UpdateableReference.java:
##########
@@ -60,13 +60,6 @@ public String getHash() {
     return reference.getHash();
   }
 
-  public Branch getAsBranch() {

Review Comment:
   I think `checkMutable` is an early check to ensure we are working on a branch which will later be modified by a commit. Whereas `getRef` or `getAsBranch` is to get the current state of the reference. I am not sure about combining them at the moment. I will explore this in a follow-up a suggested. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko merged pull request #6789: Nessie: Use latest hash for catalog APIs

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko merged PR #6789:
URL: https://github.com/apache/iceberg/pull/6789


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org