You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/01 19:41:57 UTC
[GitHub] [iceberg] pan3793 opened a new pull request #4024: Spark: Allow create table in hadoop catalog root namespace
pan3793 opened a new pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024
This is a functional regression issue in Iceberg 0.13.0. At least in Iceberg 0.12.x(I do not test every previous version), Iceberg allows creating table under the root namespace of hadoop catalog, but #3722 broke it.
Use Spark 3.2.0 and Iceberg 0.13.0, error occurs when trying to create table under root namespace of hadoop catalog.
```
java.sql.SQLException: Error operating EXECUTE_STATEMENT: java.lang.NegativeArraySizeException
at java.lang.reflect.Array.newArray(Native Method)
at java.lang.reflect.Array.newInstance(Array.java:75)
at java.util.Arrays.copyOf(Arrays.java:3212)
at java.util.Arrays.copyOf(Arrays.java:3181)
at org.apache.iceberg.spark.SparkCatalog.namespaceToIdentifier(SparkCatalog.java:570)
at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:492)
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:135)
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:92)
at org.apache.spark.sql.connector.catalog.TableCatalog.tableExists(TableCatalog.java:119)
at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:40)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace
Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1028142961
There are a couple minor things to fix, but overall good catch. Thanks, @pan3793!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] pan3793 commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace
Posted by GitBox <gi...@apache.org>.
pan3793 commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1028227955
Addressed comments, also ported to spark 3.0/3.1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] wypoon commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace
Posted by GitBox <gi...@apache.org>.
wypoon commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1028246824
LGTM. Thanks for catching this @pan3793!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on a change in pull request #4024: Spark: Allow create table in hadoop catalog root namespace
Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#discussion_r797816930
##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java
##########
@@ -40,8 +40,12 @@
import org.junit.Test;
public class TestCreateTable extends SparkCatalogTestBase {
+
+ private final boolean isHadoopCatalog;
+
public TestCreateTable(String catalogName, String implementation, Map<String, String> config) {
super(catalogName, implementation, config);
+ this.isHadoopCatalog = "testhadoop".equals(catalogName);
Review comment:
There's no need for a field. Can you just move this test into the `Assume` line?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue merged pull request #4024: Spark: Allow create table in hadoop catalog root namespace
Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on a change in pull request #4024: Spark: Allow create table in hadoop catalog root namespace
Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#discussion_r797816096
##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -567,6 +571,7 @@ private static void checkNotPathIdentifier(Identifier identifier, String method)
}
private Identifier namespaceToIdentifier(String[] namespace) {
+ assert namespace.length > 0;
Review comment:
We don't use assertions. If this is worth checking, then use a Precondition to create a readable error message.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] pan3793 commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace
Posted by GitBox <gi...@apache.org>.
pan3793 commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1027596161
cc @wypoon @rdblue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace
Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1028143608
I'm adding this to 0.13.1 since it is a regression.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #4024: Spark: Allow create table in hadoop catalog root namespace
Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #4024:
URL: https://github.com/apache/iceberg/pull/4024#issuecomment-1028402581
Thanks, @pan3793!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org