You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "dramaticlly (via GitHub)" <gi...@apache.org> on 2023/05/19 00:53:50 UTC

[GitHub] [iceberg] dramaticlly opened a new pull request, #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

dramaticlly opened a new pull request, #7652:
URL: https://github.com/apache/iceberg/pull/7652

   This fix help throw proper validation exception instead of NullPointerException when create branch or tag on iceberg table without any snapshot. 
   
   ## Before
   ```scala
   
   Before
   ```scala
   scala> val snapshotId = branchOptions.snapshotId.getOrElse(table.currentSnapshot().snapshotId())
   java.lang.NullPointerException
     at $anonfun$res38$1(<console>:47)
     at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
     at scala.Option.getOrElse(Option.scala:189)
     ... 51 elided
   ```
   
   
   After
   ```scala
   scala> val snapshotId: java.lang.Long = branchOptions.snapshotId
             .orElse(Option(iceberg.table.currentSnapshot()).map(_.snapshotId()))
             .map(java.lang.Long.valueOf)
             .orNull
   snapshotId: Long = null
   ```
   
   CC @amogh-jahagirdar @zhangbutao 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #7652:
URL: https://github.com/apache/iceberg/pull/7652#discussion_r1199184015


##########
spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec.scala:
##########
@@ -41,10 +42,17 @@ case class CreateOrReplaceBranchExec(
   override protected def run(): Seq[InternalRow] = {
     catalog.loadTable(ident) match {
       case iceberg: SparkTable =>
-        val snapshotId = branchOptions.snapshotId.getOrElse(iceberg.table.currentSnapshot().snapshotId())
+        val snapshotId: java.lang.Long = branchOptions.snapshotId
+          .orElse(Option(iceberg.table.currentSnapshot()).map(_.snapshotId()))
+          .map(java.lang.Long.valueOf)
+          .orNull
+
+        Preconditions.checkArgument(snapshotId != null,
+          "Cannot complete create or replace branch operation on %s, no valid snapshotId found", ident)

Review Comment:
   Nit in the message: "Cannot perform create or replace branch on %s, main has no snapshots". I think we can be explicit that this case only happens when main has no snapshots. if someone specifies an invalid snapshot explicitly  (`ALTER TABLE t1 CREATE BRANCH b1 AS OF VERSION 12345` where 12345 is invalid it will fail in the API.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #7652:
URL: https://github.com/apache/iceberg/pull/7652#discussion_r1199184015


##########
spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec.scala:
##########
@@ -41,10 +42,17 @@ case class CreateOrReplaceBranchExec(
   override protected def run(): Seq[InternalRow] = {
     catalog.loadTable(ident) match {
       case iceberg: SparkTable =>
-        val snapshotId = branchOptions.snapshotId.getOrElse(iceberg.table.currentSnapshot().snapshotId())
+        val snapshotId: java.lang.Long = branchOptions.snapshotId
+          .orElse(Option(iceberg.table.currentSnapshot()).map(_.snapshotId()))
+          .map(java.lang.Long.valueOf)
+          .orNull
+
+        Preconditions.checkArgument(snapshotId != null,
+          "Cannot complete create or replace branch operation on %s, no valid snapshotId found", ident)

Review Comment:
   Nit in the message: "Cannot perform create or replace branch on %s, main has no snapshots". I think we can be explicit that this case only happens when main is invalid. if someone specifies an invalid snapshot explicitly it will fail in the API. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly commented on a diff in pull request #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly commented on code in PR #7652:
URL: https://github.com/apache/iceberg/pull/7652#discussion_r1199207988


##########
spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec.scala:
##########
@@ -41,10 +42,17 @@ case class CreateOrReplaceBranchExec(
   override protected def run(): Seq[InternalRow] = {
     catalog.loadTable(ident) match {
       case iceberg: SparkTable =>
-        val snapshotId = branchOptions.snapshotId.getOrElse(iceberg.table.currentSnapshot().snapshotId())
+        val snapshotId: java.lang.Long = branchOptions.snapshotId
+          .orElse(Option(iceberg.table.currentSnapshot()).map(_.snapshotId()))
+          .map(java.lang.Long.valueOf)
+          .orNull
+
+        Preconditions.checkArgument(snapshotId != null,
+          "Cannot complete create or replace branch operation on %s, no valid snapshotId found", ident)

Review Comment:
   updated per your suggestion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangbutao commented on pull request #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

Posted by "zhangbutao (via GitHub)" <gi...@apache.org>.
zhangbutao commented on PR #7652:
URL: https://github.com/apache/iceberg/pull/7652#issuecomment-1556074209

   I think this fix is the same idea I had before. But it doesn't matter, I am glad to see it was fixed. Thanks. closing my previous pr.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly commented on a diff in pull request #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly commented on code in PR #7652:
URL: https://github.com/apache/iceberg/pull/7652#discussion_r1199158777


##########
spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestBranchDDL.java:
##########
@@ -89,6 +91,13 @@ public void testCreateBranch() throws NoSuchTableException {
         () -> sql("ALTER TABLE %s CREATE BRANCH %s", tableName, branchName));
   }
 
+  @Test
+  public void testCreateBranchOnEmptyTable() {
+    Assertions.assertThatThrownBy(() -> sql("ALTER TABLE %s CREATE BRANCH %s", tableName, "b1"))
+        .isInstanceOf(ValidationException.class)
+        .hasMessageContaining("Cannot set b1 to unknown snapshot");

Review Comment:
   discussed offline, looks like this is relate to implicit scala Long to primitive java.lang.long conversion, seems fragile and confusion if we rely on that to guard against bad input. So we ended check snapshotId not null before pass to java code and also make unit test more self-contained. 
   
   @amogh-jahagirdar ready for another look, if new exception and error message looks good to you, I'll backport fix to other spark version and try to get it merged for 1.3.0 release



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar merged pull request #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar merged PR #7652:
URL: https://github.com/apache/iceberg/pull/7652


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on PR #7652:
URL: https://github.com/apache/iceberg/pull/7652#issuecomment-1554654166

   @amogh-jahagirdar Would be great to get this into 1.3.0 as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #7652:
URL: https://github.com/apache/iceberg/pull/7652#discussion_r1199114303


##########
spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestBranchDDL.java:
##########
@@ -89,6 +91,13 @@ public void testCreateBranch() throws NoSuchTableException {
         () -> sql("ALTER TABLE %s CREATE BRANCH %s", tableName, branchName));
   }
 
+  @Test
+  public void testCreateBranchOnEmptyTable() {
+    Assertions.assertThatThrownBy(() -> sql("ALTER TABLE %s CREATE BRANCH %s", tableName, "b1"))
+        .isInstanceOf(ValidationException.class)
+        .hasMessageContaining("Cannot set b1 to unknown snapshot");

Review Comment:
   Seems like we're relying on the throw from here? https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1170
   
   I'm not sure that's what we want to do, that's supposed to be for assigning a ref to an invalid snapshot. In this case we want to signal to a user that no explicit snapshot was specified and there is no current main so the `CREATE BRANCH` or `TAG` is invalid. Could we just explicitly throw based off that?
   
   On a side note, I'm not sure how we even make it that far into TableMetadata, I figured the `createBranch` API would throw an NPE since that API requires an explicit `long`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #7652:
URL: https://github.com/apache/iceberg/pull/7652#discussion_r1199114303


##########
spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestBranchDDL.java:
##########
@@ -89,6 +91,13 @@ public void testCreateBranch() throws NoSuchTableException {
         () -> sql("ALTER TABLE %s CREATE BRANCH %s", tableName, branchName));
   }
 
+  @Test
+  public void testCreateBranchOnEmptyTable() {
+    Assertions.assertThatThrownBy(() -> sql("ALTER TABLE %s CREATE BRANCH %s", tableName, "b1"))
+        .isInstanceOf(ValidationException.class)
+        .hasMessageContaining("Cannot set b1 to unknown snapshot");

Review Comment:
   Seems like we're relying on the throw from here? https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1170
   
   I'm not sure that's what we want to do, that's supposed to be for assigning a ref to an invalid snapshot. In this case we want to signal to a user that no explicit snapshot was specified and there is no current main so the `CREATE BRANCH` or `TAG` is invalid. Could we just explicitly throw based off that?
   
   On a side note, I'm not sure how we even make it that far into TableMetadata, I figured the `createBranch` API would throw an NPE since that API requires an explicit `long` which would fail with NPE if the passed in Long was a null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #7652: Spark 3.4: Fix NPE when create branch and tag on table without snapshot

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #7652:
URL: https://github.com/apache/iceberg/pull/7652#discussion_r1199114303


##########
spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestBranchDDL.java:
##########
@@ -89,6 +91,13 @@ public void testCreateBranch() throws NoSuchTableException {
         () -> sql("ALTER TABLE %s CREATE BRANCH %s", tableName, branchName));
   }
 
+  @Test
+  public void testCreateBranchOnEmptyTable() {
+    Assertions.assertThatThrownBy(() -> sql("ALTER TABLE %s CREATE BRANCH %s", tableName, "b1"))
+        .isInstanceOf(ValidationException.class)
+        .hasMessageContaining("Cannot set b1 to unknown snapshot");

Review Comment:
   Seems like we're relying on the throw from here? https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1170
   
   I'm not sure that's what we want to do, that's supposed to be for assigning a ref to an invalid snapshot. In this case we want to signal to a user that no explicit snapshot was specified and there is no current main so the `CREATE BRANCH` or `TAG` is invalid. Could we just explicitly throw based off that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org