You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Peter Toth (Jira)" <ji...@apache.org> on 2020/10/09 07:48:00 UTC

[jira] [Commented] (SPARK-33098) Exception when using 'in' to compare a partition column to a literal with the wrong type

    [ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210689#comment-17210689 ] 

Peter Toth commented on SPARK-33098:
------------------------------------

I've started to look into this issue.

> Exception when using 'in' to compare a partition column to a literal with the wrong type
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-33098
>                 URL: https://issues.apache.org/jira/browse/SPARK-33098
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Bruce Robbins
>            Priority: Major
>
> Comparing a partition column against a literal with the wrong type works if you use equality ('='). However, if you use 'in', you get:
> {noformat}
> MetaException(message:Filtering is supported only on partition keys of type string)
> {noformat}
> For example:
> {noformat}
> spark-sql> create table test (a int) partitioned by (b int) stored as parquet;
> Time taken: 0.323 seconds
> spark-sql> insert into test values (1, 1), (1, 2), (2, 2);
> 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test
> 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test
> 20/10/08 19:57:14 WARN log: Updated size to 418
> 20/10/08 19:57:14 WARN log: Updated size to 836
> Time taken: 2.124 seconds
> spark-sql> -- this works, of course
> spark-sql> select * from test where b in (2);
> 1	2
> 2	2
> Time taken: 0.13 seconds, Fetched 2 row(s)
> spark-sql> -- this also works (equals with wrong type)
> spark-sql> select * from test where b = '2';
> 1	2
> 2	2
> Time taken: 0.132 seconds, Fetched 2 row(s)
> spark-sql> -- this does not work ('in' with wrong type)
> spark-sql> select * from test where b in ('2');
> 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b in ('2')]
> java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK
> 	at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828)
> -
> -
> -
> Caused by: MetaException(message:Filtering is supported only on partition keys of type string)
> {noformat}
> There are also interesting variations of this using the dataframe API:
> {noformat}
> scala> sql("select cast(b as string) as b from test where b in (2)").show(false)
> +---+
> |b  |
> +---+
> |2  |
> |2  |
> +---+
> scala> sql("select cast(b as string) as b from test").filter("b in (2)").show(false)
> java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK
>   at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828)
> -
> -
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org