You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/25 02:17:59 UTC
[GitHub] [iceberg] vanliu-tx opened a new issue #3969: Got exception if overwrite partitions with empty dataset by spark
vanliu-tx opened a new issue #3969:
URL: https://github.com/apache/iceberg/issues/3969
In 0.12.1, write empty dataset by Spark append mode will result in an empty snapshot created, write empty dataset by Spark overwrite mode will result in an exception below:
```
org.apache.spark.SparkException: Writing job aborted.
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:260)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at com.tencent.bk.base.datahub.iceberg.SparkUtils.writeTable(SparkUtils.java:248)
at com.tencent.bk.base.datahub.iceberg.TestSparkUtils.testWriteEmptyDs(TestSparkUtils.java:223)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
Caused by: java.lang.IllegalStateException: Cannot determine partition spec: no data or delete files have been added
at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkState(Preconditions.java:508)
at org.apache.iceberg.MergingSnapshotProducer.writeSpec(MergingSnapshotProducer.java:121)
at org.apache.iceberg.BaseReplacePartitions.apply(BaseReplacePartitions.java:58)
at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:164)
at org.apache.iceberg.BaseReplacePartitions.apply(BaseReplacePartitions.java:26)
at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:283)
at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405)
at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:282)
at org.apache.iceberg.BaseReplacePartitions.commit(BaseReplacePartitions.java:26)
at org.apache.iceberg.spark.source.Writer.commitOperation(Writer.java:169)
at org.apache.iceberg.spark.source.Writer.replacePartitions(Writer.java:195)
at org.apache.iceberg.spark.source.Writer.commit(Writer.java:145)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:76)
... 42 more
```
Overwrite partitions with emtpy dataset is the same as these lines of code:
```
Table t = loadTable(xxx);
t.newReplacePartitions().commit();
```
In the commit method, it calls down to https://github.com/apache/iceberg/blob/a1bd63d56751999ecee89b871992d7bac395fd52/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L121 and failed in precondition check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] vanliu-tx commented on issue #3969: Got exception if overwrite partitions with empty dataset by spark
Posted by GitBox <gi...@apache.org>.
vanliu-tx commented on issue #3969:
URL: https://github.com/apache/iceberg/issues/3969#issuecomment-1040096871
closed by https://github.com/apache/iceberg/pull/2960
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] vanliu-tx commented on issue #3969: Got exception if overwrite partitions with empty dataset by spark
Posted by GitBox <gi...@apache.org>.
vanliu-tx commented on issue #3969:
URL: https://github.com/apache/iceberg/issues/3969#issuecomment-1040096871
closed by https://github.com/apache/iceberg/pull/2960
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] vanliu-tx closed issue #3969: Got exception if overwrite partitions with empty dataset by spark
Posted by GitBox <gi...@apache.org>.
vanliu-tx closed issue #3969:
URL: https://github.com/apache/iceberg/issues/3969
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] vanliu-tx closed issue #3969: Got exception if overwrite partitions with empty dataset by spark
Posted by GitBox <gi...@apache.org>.
vanliu-tx closed issue #3969:
URL: https://github.com/apache/iceberg/issues/3969
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org