You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Max Gekk (Jira)" <ji...@apache.org> on 2022/10/07 12:41:00 UTC
[jira] [Resolved] (SPARK-40521) PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions instead of the conflicting partition

     [ https://issues.apache.org/jira/browse/SPARK-40521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Max Gekk resolved SPARK-40521.
------------------------------
    Fix Version/s: 3.4.0
       Resolution: Fixed

Issue resolved by pull request 38134
[https://github.com/apache/spark/pull/38134]

> PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions instead of the conflicting partition
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-40521
>                 URL: https://issues.apache.org/jira/browse/SPARK-40521
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Serge Rielau
>            Assignee: Max Gekk
>            Priority: Minor
>             Fix For: 3.4.0
>
>         Attachments: Screen Shot 2022-09-21 at 10.08.44 AM.png, Screen Shot 2022-09-21 at 10.08.52 AM.png
>
>
> PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions instead of the conflicting partition
> When I run:
> AlterTableAddPartitionSuiteBase for Hive
> The test: partition already exists
> Fails in my my local build ONLY in that mode because it reports two partitions as conflicting where there should be only one. In all other modes the test succeeds.
> The test is passing on master because the test does not check the partitions themselves.
> Repro on master: Note that c1 = 1 does not already exist. It should NOT be listed 
> create table t(c1 int, c2 int) partitioned by (c1);
> alter table t add partition (c1 = 2);
> alter table t add partition (c1 = 1) partition (c1 = 2);
> 22/09/21 09:30:09 ERROR Hive: AlreadyExistsException(message:Partition already exists: Partition(values:[2], dbName:default, tableName:t, createTime:0, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:c2, type:int, comment:null)], location:file:/Users/serge.rielau/spark/spark-warehouse/t/c1=2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:\{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:null))
>  at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.startAddPartition(HiveMetaStore.java:2744)
>  at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:2442)
>  at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:2560)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>  at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>  at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>  at com.sun.proxy.$Proxy31.add_partitions_req(Unknown Source)
>  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>  at com.sun.proxy.$Proxy32.add_partitions(Unknown Source)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createPartitions(Hive.java:2103)
>  at org.apache.spark.sql.hive.client.Shim_v0_13.createPartitions(HiveShim.scala:763)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createPartitions$1(HiveClientImpl.scala:631)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:296)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.createPartitions(HiveClientImpl.scala:624)
>  at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createPartitions$1(HiveExternalCatalog.scala:1039)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
>  at org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:1021)
>  at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createPartitions(ExternalCatalogWithListener.scala:201)
>  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:1169)
>  at org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.$anonfun$run$17(ddl.scala:514)
>  at org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.$anonfun$run$17$adapted(ddl.scala:513)
>  at scala.collection.Iterator.foreach(Iterator.scala:943)
>  at scala.collection.Iterator.foreach$(Iterator.scala:943)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>  at org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:513)
>  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>  at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
>  at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
>  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:111)
>  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:171)
>  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
>  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
>  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>  at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
>  at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
>  at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
>  at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
>  at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
>  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
> ...
>  
> *The following partitions already exists in table 't' database 'default':*
> {color:#de350b}*Map(c1 -> 1)*{color}
> {color:#de350b}*===*{color}
> *Map(c1 -> 2)*
> spark-sql> 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org