You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/24 15:58:36 UTC

[GitHub] [iceberg] RussellSpitzer commented on issue #2734: Getting - java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions: List(1, 0) on MERGE INTO

RussellSpitzer commented on issue #2734:
URL: https://github.com/apache/iceberg/issues/2734#issuecomment-867758694


   This is an error with partitioning, and is fixed with
   
   https://github.com/apache/iceberg/pull/2584
   
   On Thu, Jun 24, 2021 at 10:54 AM Ayush Chauhan ***@***.***>
   wrote:
   
   > Hi,
   >
   > While running MERGE INTO, I am getting the following exception
   >
   > java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions: List(1, 0)
   >   at org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:58)
   >   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
   >   at scala.Option.getOrElse(Option.scala:189)
   >   at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
   >   at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
   >   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
   >   at scala.Option.getOrElse(Option.scala:189)
   >   at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
   >   at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:366)
   >   at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:361)
   >   at org.apache.spark.sql.execution.datasources.v2.ReplaceDataExec.writeWithV2(ReplaceDataExec.scala:26)
   >   at org.apache.spark.sql.execution.datasources.v2.ReplaceDataExec.run(ReplaceDataExec.scala:31)
   >   at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
   >   at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
   >   at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
   >   at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:230)
   >   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3667)
   >   at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
   >   at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
   >   at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
   >   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
   >   at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
   >   at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
   >   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
   >   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
   >   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   >   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   >   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3665)
   >   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:230)
   >   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   >   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
   >   at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   >   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
   >
   > My query
   >
   > MERGE INTO db_name.target_iceberg_table target
   > USING (SOME QUERY ON ICEBERG TABLE) temp
   > ON t.Id = s.Id WHEN MATCHED THEN
   > UPDATE SET .....
   > WHEN NOT MATCHED THEN INSERT *
   >
   > Both source and target tables are iceberg tables, temp table has increment
   > data between two runs. I am running this for iceberg 0.11.0 version
   >
   > My spark conf
   >
   > --conf spark.sql.orc.impl=native
   > --conf spark.sql.orc.enableVectorizedReader=true
   > --conf spark.sql.hive.convertMetastoreOrc=true
   > --conf spark.shuffle.blockTransferService='nio'
   > --conf spark.executor.defaultJavaOptions='-XX:+UseG1GC'
   > --conf spark.hadoop.orc.overwrite.output.file=true
   > --conf spark.driver.defaultJavaOptions='-XX:+UseG1GC'
   > --conf spark.yarn.maxAppAttempts=1
   > --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   > --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
   > --conf spark.sql.catalog.spark_catalog.type=hive
   > --conf spark.sql.catalog.spark_catalog.uri=thrift://METASTORE:9083
   > --conf spark.sql.catalog.hive=org.apache.iceberg.spark.SparkCatalog
   > --conf spark.sql.catalog.hive.type=hive
   > --conf spark.sql.catalog.hive.uri=thrift://METASTORE:9083
   > --conf spark.sql.broadcastTimeout=1500
   >
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/issues/2734>, or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AADE2YOL75PENIDLRVXEVYLTUNILFANCNFSM47IFPP4A>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org