You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/14 09:04:40 UTC

[GitHub] [iceberg] felixYyu opened a new issue #4114: Spark: select * from table.partitions Exception

felixYyu opened a new issue #4114:
URL: https://github.com/apache/iceberg/issues/4114


   spark 3.2.1
   iceberg 0.13.0
   
   1.create table partitioned by hours(ts), after insert overwrite data and drop partition field hours(ts), then `select * from table.partitions` with spark sql, but exception occurred
   ```
   Caused by: java.lang.IllegalStateException: Unknown type for long field. Type name: java.lang.Integer
   	at org.apache.iceberg.spark.source.StructInternalRow.getLong(StructInternalRow.java:146)
   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
   	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
   	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:127)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   ```
   
   2.create table partitioned by bucket(5, id), after insert overwrite data and drop partition field bucket(5, id), then `select * from table.partitions` with spark sql, but exception occurred
   ```
   Wrong class, java.lang.Long, for object: 0
   Exception in thread "main" java.lang.IllegalArgumentException: Wrong class, java.lang.Long, for object: 0
   	at org.apache.iceberg.PartitionData.get(PartitionData.java:120)
   	at org.apache.iceberg.types.Comparators$StructLikeComparator.compare(Comparators.java:126)
   	at org.apache.iceberg.types.Comparators$StructLikeComparator.compare(Comparators.java:102)
   	at org.apache.iceberg.util.StructLikeWrapper.equals(StructLikeWrapper.java:76)
   	at java.util.HashMap.getNode(HashMap.java:572)
   	at java.util.HashMap.get(HashMap.java:557)
   	at org.apache.iceberg.PartitionsTable$PartitionMap.get(PartitionsTable.java:153)
   	at org.apache.iceberg.PartitionsTable.partitions(PartitionsTable.java:101)
   	at org.apache.iceberg.PartitionsTable.task(PartitionsTable.java:75)
   	at org.apache.iceberg.PartitionsTable.access$300(PartitionsTable.java:35)
   	at org.apache.iceberg.PartitionsTable$PartitionsScan.lambda$new$0(PartitionsTable.java:138)
   	at org.apache.iceberg.StaticTableScan.planFiles(StaticTableScan.java:66)
   	at org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:193)
   	at org.apache.iceberg.spark.source.SparkBatchQueryScan.files(SparkBatchQueryScan.java:114)
   	at org.apache.iceberg.spark.source.SparkBatchQueryScan.tasks(SparkBatchQueryScan.java:128)
   	at org.apache.iceberg.spark.source.SparkBatchScan.planInputPartitions(SparkBatchScan.java:141)
   	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.partitions$lzycompute(BatchScanExec.scala:52)
   	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.partitions(BatchScanExec.scala:52)
   	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar(DataSourceV2ScanExecBase.scala:93)
   	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar$(DataSourceV2ScanExecBase.scala:92)
   	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.supportsColumnar(BatchScanExec.scala:35)
   	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:123)
   	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #4114: Spark: select * from table.partitions Exception

Posted by GitBox <gi...@apache.org>.

RussellSpitzer commented on issue #4114:
URL: https://github.com/apache/iceberg/issues/4114#issuecomment-1039653819


   Tried to repo this, but didn't seem to have the issue. Are you sure your classpath doesn't have entries from previous iceberg versions? Seems like it may be from the older partition metadata code.
   
   ```scala> spark.sql("create table iceberg.default.ptest (id int, test string) location '/Users/russellspitzer/Temp/ffz'")
   res6: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("insert into table iceberg.default.ptest Values (5, 'foobar')")
   
   scala> spark.sql("ALTER table iceberg.default.ptest ADD PARTITION FIELD bucket(5, id)")
   res8: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("insert into table iceberg.default.ptest Values (6, 'foobar')")
   res9: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("SELECT * from iceberg.default.ptest.partitions")
   res10: org.apache.spark.sql.DataFrame = [partition: struct<id_bucket_5: int>, record_count: bigint ... 1 more field]
   
   scala> spark.sql("SELECT * from iceberg.default.ptest.partitions").show
   +---------+------------+----------+
   |partition|record_count|file_count|
   +---------+------------+----------+
   |   {null}|           1|         1|
   |      {4}|           1|         1|
   +---------+------------+----------+
   
   
   scala> spark.sql("ALTER TABLE iceberg.default.ptest DROP PARTITION FIELD bucket(5, id)")
   res12: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("SELECT * from iceberg.default.ptest.partitions").show
   +---------+------------+----------+
   |partition|record_count|file_count|
   +---------+------------+----------+
   |   {null}|           1|         1|
   |      {4}|           1|         1|
   +---------+------------+----------+
   
   
   scala> spark.sql("insert into table iceberg.default.ptest Values (7, 'foobar')")
   res14: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("SELECT * from iceberg.default.ptest.partitions").show
   +---------+------------+----------+
   |partition|record_count|file_count|
   +---------+------------+----------+
   |   {null}|           2|         2|
   |      {4}|           1|         1|
   +---------+------------+----------+```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #4114: Spark: select * from table.partitions Exception

Posted by GitBox <gi...@apache.org>.

RussellSpitzer commented on issue #4114:
URL: https://github.com/apache/iceberg/issues/4114#issuecomment-1039653819


   Tried to repo this, but didn't seem to have the issue. Are you sure your classpath doesn't have entries from previous iceberg versions? Seems like it may be from the older partition metadata code.
   
   ```scala> spark.sql("create table iceberg.default.ptest (id int, test string) location '/Users/russellspitzer/Temp/ffz'")
   res6: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("insert into table iceberg.default.ptest Values (5, 'foobar')")
   
   scala> spark.sql("ALTER table iceberg.default.ptest ADD PARTITION FIELD bucket(5, id)")
   res8: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("insert into table iceberg.default.ptest Values (6, 'foobar')")
   res9: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("SELECT * from iceberg.default.ptest.partitions")
   res10: org.apache.spark.sql.DataFrame = [partition: struct<id_bucket_5: int>, record_count: bigint ... 1 more field]
   
   scala> spark.sql("SELECT * from iceberg.default.ptest.partitions").show
   +---------+------------+----------+
   |partition|record_count|file_count|
   +---------+------------+----------+
   |   {null}|           1|         1|
   |      {4}|           1|         1|
   +---------+------------+----------+
   
   
   scala> spark.sql("ALTER TABLE iceberg.default.ptest DROP PARTITION FIELD bucket(5, id)")
   res12: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("SELECT * from iceberg.default.ptest.partitions").show
   +---------+------------+----------+
   |partition|record_count|file_count|
   +---------+------------+----------+
   |   {null}|           1|         1|
   |      {4}|           1|         1|
   +---------+------------+----------+
   
   
   scala> spark.sql("insert into table iceberg.default.ptest Values (7, 'foobar')")
   res14: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("SELECT * from iceberg.default.ptest.partitions").show
   +---------+------------+----------+
   |partition|record_count|file_count|
   +---------+------------+----------+
   |   {null}|           2|         2|
   |      {4}|           1|         1|
   +---------+------------+----------+```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] felixYyu commented on issue #4114: Spark: select * from table.partitions Exception

Posted by GitBox <gi...@apache.org>.

felixYyu commented on issue #4114:
URL: https://github.com/apache/iceberg/issues/4114#issuecomment-1042486163


   thanks @RussellSpitzer , I try test again and close this issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] felixYyu closed issue #4114: Spark: select * from table.partitions Exception

Posted by GitBox <gi...@apache.org>.

felixYyu closed issue #4114:
URL: https://github.com/apache/iceberg/issues/4114


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org