You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shivu Sondur (JIRA)" <ji...@apache.org> on 2019/04/13 06:50:00 UTC

[jira] [Comment Edited] (SPARK-27421) RuntimeException when querying a view on a partitioned parquet table

    [ https://issues.apache.org/jira/browse/SPARK-27421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816851#comment-16816851 ] 

Shivu Sondur edited comment on SPARK-27421 at 4/13/19 6:49 AM:
---------------------------------------------------------------

[~emaynard]

Issue I am able to reproduce.

FYI: with "stored as parquet" also 

only 
spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
works

But following query will fail, 
spark.sql("SELECT * FROM test_view WHERE id = '0'").show(false)
 


was (Author: shivusondur@gmail.com):
Issue I am able to reproduce.

FYI: with "stored as parquet" also 

only 
spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
works

But following query will fail, 
spark.sql("SELECT * FROM test_view WHERE id = '0'").show(false)
 

> RuntimeException when querying a view on a partitioned parquet table
> --------------------------------------------------------------------
>
>                 Key: SPARK-27421
>                 URL: https://issues.apache.org/jira/browse/SPARK-27421
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0, 2.4.1
>         Environment: Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_141)
>            Reporter: Eric Maynard
>            Priority: Minor
>
> When running a simple query, I get the following stacktrace:
> {code}
> java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK
>  at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772)
>  at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:686)
>  at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:684)
>  at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
>  at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:684)
>  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1268)
>  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1261)
>  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
>  at org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1261)
>  at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:262)
>  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:957)
>  at org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>  at org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:63)
>  at org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:255)
>  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
>  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
>  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
>  at org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>  at org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)
>  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
>  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
>  at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>  at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>  at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
>  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
>  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
>  at scala.collection.immutable.List.foreach(List.scala:392)
>  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
>  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:66)
>  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:66)
>  at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
>  at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
>  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
>  at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
>  at org.apache.spark.sql.execution.QueryExecution$$anonfun$simpleString$1.apply(QueryExecution.scala:188)
>  at org.apache.spark.sql.execution.QueryExecution$$anonfun$simpleString$1.apply(QueryExecution.scala:188)
>  at org.apache.spark.sql.execution.QueryExecution.stringOrError(QueryExecution.scala:99)
>  at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:188)
>  at org.apache.spark.sql.execution.command.ExplainCommand.run(commands.scala:171)
>  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>  at org.apache.spark.sql.Dataset.explain(Dataset.scala:484)
>  at org.apache.spark.sql.Dataset.explain(Dataset.scala:497)
>  ... 49 elided
>  Caused by: java.lang.reflect.InvocationTargetException: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:759)
>  ... 99 more
>  Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string
>  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_filter_result$get_partitions_by_filter_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_filter_result$get_partitions_by_filter_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_filter_result.read(ThriftHiveMetastore.java)
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_filter(ThriftHiveMetastore.java:2514)
>  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_filter(ThriftHiveMetastore.java:2498)
>  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:1453)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
>  at com.sun.proxy.$Proxy33.listPartitionsByFilter(Unknown Source)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2562)
>  at com.sun.proxy.$Proxy33.listPartitionsByFilter(Unknown Source)
>  at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2820)
>  ... 104 more
> {code}
> I can replicate the issue with the following:
> {code:java}
> spark.sql("CREATE table test (name STRING) partitioned by (id int) STORED AS PARQUET")
> spark.sql("CREATE VIEW test_view as select cast(id as string) as id, name from test")
> spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
> {code}
> If we use a table which isn't stored as parquet, there appears to be no issue:
> {code:java}
> spark.sql("CREATE table test (name STRING) partitioned by (id int)")
> spark.sql("CREATE VIEW test_view as select cast(id as string) as id, name from test")
> spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org