You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhouyuan (Jira)" <ji...@apache.org> on 2021/03/08 02:52:00 UTC
[jira] [Comment Edited] (SPARK-34631) Caught Hive MetaException when query by partition (partition col start with ‘$’)

    [ https://issues.apache.org/jira/browse/SPARK-34631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297042#comment-17297042 ] 

zhouyuan edited comment on SPARK-34631 at 3/8/21, 2:51 AM:
-----------------------------------------------------------

1. It can be queried normally on hive

2. I setting spark.sql.hive.manageFilesourcePartitions=false,  It works. 

    But this variable is enabled, use the metastore to prune partitions during query planning. Now it disabled, I think this will result in degraded performance，Is there any other way to deal with the problem？[~hyukjin.kwon]


was (Author: zhouyuan):
1. It can be queried normally on hive

2. I setting spark.sql.hive.manageFilesourcePartitions=false,  It works. 

    But this variable is enabled, use the metastore to prune partitions during query planning. Now it disabled, I think this will result in degraded performance，Is            there any other way to deal with the problem？

> Caught Hive MetaException when query by partition (partition col start with ‘$’)
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-34631
>                 URL: https://issues.apache.org/jira/browse/SPARK-34631
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.4.4
>            Reporter: zhouyuan
>            Priority: Major
>
> create a table, set location as parquet, do msck repair table to get the data.
> But when query with partition column, got some errors (adding backtick would not help)
> {code:java}
> // code placeholder
> {code}
> select count from some_table where `$partition_date` = '2015-01-01'
>  
> {panel:title=error:}
> java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK
>  at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772)
>  at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:679)
>  at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:677)
>  at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:677)
>  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1221)
>  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1214)
>  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>  at org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1214)
>  at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:254)
>  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:962)
>  at org.apache.spark.sql.hive.execution.HiveTableScanExec.rawPartitions$lzycompute(HiveTableScanExec.scala:174)
>  at org.apache.spark.sql.hive.execution.HiveTableScanExec.rawPartitions(HiveTableScanExec.scala:166)
>  at org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$11.apply(HiveTableScanExec.scala:192)
>  at org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$11.apply(HiveTableScanExec.scala:192)
>  at org.apache.spark.util.Utils$.withDummyCallSite(Utils.scala:2470)
>  at org.apache.spark.sql.hive.execution.HiveTableScanExec.doExecute(HiveTableScanExec.scala:191)
>  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
>  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
>  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
>  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3389)
>  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)
>  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)
>  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
>  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
>  at org.apache.spark.sql.Dataset.head(Dataset.scala:2550)
>  at org.apache.spark.sql.Dataset.take(Dataset.scala:2764)
>  at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)
>  at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)
>  at org.apache.spark.sql.Dataset.show(Dataset.scala:751)
>  at org.apache.spark.sql.Dataset.show(Dataset.scala:710)
>  at org.apache.spark.sql.Dataset.show(Dataset.scala:719)
>  ... 49 elided
> Caused by: java.lang.reflect.InvocationTargetException: org.apache.hadoop.hive.metastore.api.MetaException: Error parsing partition filter : line 1:0 no viable alternative at character '$'
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:759)
>  ... 92 more
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Error parsing partition filter : line 1:0 no viable alternative at character '$'
>  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_filter_result$get_partitions_by_filter_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_filter_result$get_partitions_by_filter_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_filter_result.read(ThriftHiveMetastore.java)
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_filter(ThriftHiveMetastore.java:2216)
>  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_filter(ThriftHiveMetastore.java:2200)
>  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:1103)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
>  at com.sun.proxy.$Proxy15.listPartitionsByFilter(Unknown Source)
>  at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2254)
>  ... 97 more
> {panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org