You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/18 21:05:41 UTC

[GitHub] [iceberg] openinx opened a new issue #2710: Encounter UnsupportedOperationException when describing the spark table schema after we dropped the partitioned fields

openinx opened a new issue #2710:
URL: https://github.com/apache/iceberg/issues/2710


   There's an original iceberg table in my aliyun cloud: 
   
   ```sql
   ss_sold_time_sk	bigint	
   ss_item_sk	bigint	
   ss_customer_sk	bigint	
   ss_cdemo_sk	bigint	
   ss_hdemo_sk	bigint	
   ss_addr_sk	bigint	
   ss_store_sk	bigint	
   ss_promo_sk	bigint	
   ss_ticket_number	bigint	
   ss_quantity	int	
   ss_wholesale_cost	decimal(7,2)	
   ss_list_price	decimal(7,2)	
   ss_sales_price	decimal(7,2)	
   ss_ext_discount_amt	decimal(7,2)	
   ss_ext_sales_price	decimal(7,2)	
   ss_ext_wholesale_cost	decimal(7,2)	
   ss_ext_list_price	decimal(7,2)	
   ss_ext_tax	decimal(7,2)	
   ss_coupon_amt	decimal(7,2)	
   ss_net_paid	decimal(7,2)	
   ss_net_paid_inc_tax	decimal(7,2)	
   ss_net_profit	decimal(7,2)	
   ss_sold_date_sk	bigint	
   		
   # Partitioning		
   Part 0	bucket(200, ss_sold_date_sk)	
   Time taken: 0.19 seconds, Fetched 26 row(s)
   ```
   
   After we dropped the partitioned fields `ss_sold_date_sk` by the following SQL: 
   
   ```sql
   ALTER TABLE emr_dlf_catalog.emr_dlf_db.store_sales DROP PARTITION FIELD bucket(200, ss_sold_date_sk);
   ```
   
   And we are trying to describe the iceberg table schema we will encounter the exception: 
   
   ```
   spark-sql> describe emr_dlf_catalog.emr_dlf_db.store_sales
            > ;
   21/06/18 12:09:53 INFO [main] BaseMetastoreTableOperations: Refreshing table metadata from new version: oss://emr-iceberg/warehouse/emr_dlf_db.db/store_sales/metadata/00003-7cdf5364-3110-4779-b1ef-23d871fa4cab.metadata.json
   21/06/18 12:09:53 ERROR [main] SparkSQLDriver: Failed in [describe emr_dlf_catalog.emr_dlf_db.store_sales]
   java.lang.UnsupportedOperationException: Void transform is not supported
   	at org.apache.iceberg.transforms.PartitionSpecVisitor.alwaysNull(PartitionSpecVisitor.java:86)
   	at org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:148)
   	at org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:120)
   	at org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:102)
   	at org.apache.iceberg.spark.Spark3Util.toTransforms(Spark3Util.java:258)
   	at org.apache.iceberg.spark.source.SparkTable.partitioning(SparkTable.java:130)
   	at org.apache.spark.sql.execution.datasources.v2.DescribeTableExec.addPartitioning(DescribeTableExec.scala:78)
   	at org.apache.spark.sql.execution.datasources.v2.DescribeTableExec.run(DescribeTableExec.scala:41)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
   	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
   	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616)
   	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
   	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
   	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
   	at scala.collection.Iterator.foreach(Iterator.scala:941)
   	at scala.collection.Iterator.foreach$(Iterator.scala:941)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
   	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
   	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
   	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
   	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
   	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
   	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
   	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
   	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
   	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   java.lang.UnsupportedOperationException: Void transform is not supported
   	at org.apache.iceberg.transforms.PartitionSpecVisitor.alwaysNull(PartitionSpecVisitor.java:86)
   	at org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:148)
   	at org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:120)
   	at org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:102)
   	at org.apache.iceberg.spark.Spark3Util.toTransforms(Spark3Util.java:258)
   	at org.apache.iceberg.spark.source.SparkTable.partitioning(SparkTable.java:130)
   	at org.apache.spark.sql.execution.datasources.v2.DescribeTableExec.addPartitioning(DescribeTableExec.scala:78)
   	at org.apache.spark.sql.execution.datasources.v2.DescribeTableExec.run(DescribeTableExec.scala:41)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
   	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
   	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616)
   	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
   	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
   	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
   	at scala.collection.Iterator.foreach(Iterator.scala:941)
   	at scala.collection.Iterator.foreach$(Iterator.scala:941)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
   	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
   	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
   	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
   	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
   	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
   	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
   	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
   	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
   	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   
   I think it's a critial bug that we should fixed in our apache iceberg (still not dig into this bug), will publish a PR to fix this if it's true. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx closed issue #2710: Encounter UnsupportedOperationException when describing the spark table schema after we dropped the partitioned fields

Posted by GitBox <gi...@apache.org>.

openinx closed issue #2710:
URL: https://github.com/apache/iceberg/issues/2710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx commented on issue #2710: Encounter UnsupportedOperationException when describing the spark table schema after we dropped the partitioned fields

Posted by GitBox <gi...@apache.org>.

openinx commented on issue #2710:
URL: https://github.com/apache/iceberg/issues/2710#issuecomment-863746410






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org