You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/01 19:41:57 UTC

[GitHub] [iceberg] pan3793 opened a new pull request #4024: Spark: Allow create table in hadoop catalog root namespace

pan3793 opened a new pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024


   This is a functional regression issue in Iceberg 0.13.0. At least in Iceberg 0.12.x(I do not test every previous version), Iceberg allows creating table under the root namespace of hadoop catalog, but #3722 broke it.
   
   Use Spark 3.2.0 and Iceberg 0.13.0, error occurs when trying to create table under root namespace of hadoop catalog.
   
   ```
   java.sql.SQLException: Error operating EXECUTE_STATEMENT: java.lang.NegativeArraySizeException
   	at java.lang.reflect.Array.newArray(Native Method)
   	at java.lang.reflect.Array.newInstance(Array.java:75)
   	at java.util.Arrays.copyOf(Arrays.java:3212)
   	at java.util.Arrays.copyOf(Arrays.java:3181)
   	at org.apache.iceberg.spark.SparkCatalog.namespaceToIdentifier(SparkCatalog.java:570)
   	at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:492)
   	at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:135)
   	at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:92)
   	at org.apache.spark.sql.connector.catalog.TableCatalog.tableExists(TableCatalog.java:119)
   	at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:40)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
   	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
   	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1028142961


   There are a couple minor things to fix, but overall good catch. Thanks, @pan3793!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pan3793 commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace

Posted by GitBox <gi...@apache.org>.

pan3793 commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1028227955


   Addressed comments, also ported to spark 3.0/3.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] wypoon commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace

Posted by GitBox <gi...@apache.org>.

wypoon commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1028246824


   LGTM. Thanks for catching this @pan3793!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #4024: Spark: Allow create table in hadoop catalog root namespace

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#discussion_r797816930



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java
##########
@@ -40,8 +40,12 @@
 import org.junit.Test;
 
 public class TestCreateTable extends SparkCatalogTestBase {
+
+  private final boolean isHadoopCatalog;
+
   public TestCreateTable(String catalogName, String implementation, Map<String, String> config) {
     super(catalogName, implementation, config);
+    this.isHadoopCatalog = "testhadoop".equals(catalogName);

Review comment:
       There's no  need for a field. Can you just move this test into the `Assume` line?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue merged pull request #4024: Spark: Allow create table in hadoop catalog root namespace

Posted by GitBox <gi...@apache.org>.

rdblue merged pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #4024: Spark: Allow create table in hadoop catalog root namespace

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#discussion_r797816096



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -567,6 +571,7 @@ private static void checkNotPathIdentifier(Identifier identifier, String method)
   }
 
   private Identifier namespaceToIdentifier(String[] namespace) {
+    assert namespace.length > 0;

Review comment:
       We don't use assertions. If this is worth checking, then use a Precondition to create a readable error message.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pan3793 commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace

Posted by GitBox <gi...@apache.org>.

pan3793 commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1027596161


   cc @wypoon @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1028143608


   I'm adding this to 0.13.1 since it is a regression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1028402581


   Thanks, @pan3793!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org