You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by "ShubhamSharmaCSE (via GitHub)" <gi...@apache.org> on 2023/03/27 20:53:48 UTC

[GitHub] [iceberg] ShubhamSharmaCSE opened a new issue, #7221: Multiple entries with same key

ShubhamSharmaCSE opened a new issue, #7221:
URL: https://github.com/apache/iceberg/issues/7221

   Getting below error when running spark application for partitioned iceberg table. Partition is based on created date over days.
   
   ```
   Exception in thread "main" java.lang.IllegalArgumentException: Multiple entries with same key: 1000=original_values.column1.column2.column3 and 1000=_partition.created_ts_day
   	at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:376)
   	at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:370)
   	at org.apache.iceberg.relocated.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:153)
   	at org.apache.iceberg.relocated.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:115)
   	at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap$Builder.buildOrThrow(ImmutableMap.java:574)
   	at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:538)
   	at org.apache.iceberg.types.IndexByName.byId(IndexByName.java:80)
   	at org.apache.iceberg.types.TypeUtil.indexNameById(TypeUtil.java:161)
   	at org.apache.iceberg.Schema.lazyIdToName(Schema.java:167)
   	at org.apache.iceberg.Schema.<init>(Schema.java:108)
   	at org.apache.iceberg.Schema.<init>(Schema.java:91)
   	at org.apache.iceberg.Schema.<init>(Schema.java:83)
   	at org.apache.iceberg.Schema.<init>(Schema.java:79)
   	at org.apache.iceberg.types.TypeUtil.join(TypeUtil.java:149)
   	at org.apache.iceberg.spark.source.SparkScanBuilder.schemaWithMetadataColumns(SparkScanBuilder.java:159)
   	at org.apache.iceberg.spark.source.SparkScanBuilder.buildMergeOnReadScan(SparkScanBuilder.java:225)
   	at org.apache.iceberg.spark.source.SparkPositionDeltaOperation$1.build(SparkPositionDeltaOperation.java:70)
   	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pruneColumns(PushDownUtils.scala:176)
   	at org.apache.spark.sql.execution.datasources.v2.RowLevelCommandScanRelationPushDown$$anonfun$apply$1.applyOrElse(RowLevelCommandScanRelationPushDown.scala:52)
   	at org.apache.spark.sql.execution.datasources.v2.RowLevelCommandScanRelationPushDown$$anonfun$apply$1.applyOrElse(RowLevelCommandScanRelationPushDown.scala:40)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591)
   	at org.apache.spark.sql.execution.datasources.v2.RowLevelCommandScanRelationPushDown$.apply(RowLevelCommandScanRelationPushDown.scala:40)
   	at org.apache.spark.sql.execution.datasources.v2.RowLevelCommandScanRelationPushDown$.apply(RowLevelCommandScanRelationPushDown.scala:37)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:215)
   	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
   	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
   	at scala.collection.immutable.List.foldLeft(List.scala:91)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:212)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:284)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:327)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:284)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:274)
   	at scala.collection.immutable.List.foreach(List.scala:431)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:274)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:188)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:134)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:213)
   	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:552)
   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:213)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
   	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:212)
   	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:130)
   	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:126)
   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$4(QueryExecution.scala:296)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:657)
   	at org.apache.spark.sql.execution.QueryExecution.writePlans(QueryExecution.scala:296)
   	at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:313)
   	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:267)
   	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:246)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81)
   	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:222)
   	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:102)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:99)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

Re: [I] Multiple entries with same key [iceberg]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on issue #7221:
URL: https://github.com/apache/iceberg/issues/7221#issuecomment-1762413736

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

Re: [I] Multiple entries with same key [iceberg]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] closed issue #7221: Multiple entries with same key
URL: https://github.com/apache/iceberg/issues/7221


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] gaborkaszab commented on issue #7221: Multiple entries with same key

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.

gaborkaszab commented on issue #7221:
URL: https://github.com/apache/iceberg/issues/7221#issuecomment-1488170491

   I believe this is a duplicate of https://github.com/apache/iceberg/issues/6368
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] github-actions[bot] commented on issue #7221: Multiple entries with same key

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on issue #7221:
URL: https://github.com/apache/iceberg/issues/7221#issuecomment-1734632907

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org