You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:13:00 UTC
[jira] [Resolved] (SPARK-18889) Spark incorrectly reads default
columns from a Hive view
[ https://issues.apache.org/jira/browse/SPARK-18889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-18889.
----------------------------------
Resolution: Incomplete
> Spark incorrectly reads default columns from a Hive view
> --------------------------------------------------------
>
> Key: SPARK-18889
> URL: https://issues.apache.org/jira/browse/SPARK-18889
> Project: Spark
> Issue Type: Bug
> Reporter: Salil Surendran
> Priority: Major
> Labels: bulk-closed
>
> Spark fails to read a view that have columns that are given default names;
> To reproduce follow the following steps in Hive:
> * CREATE TABLE IF NOT EXISTS employee_details ( eid int, name String,
> salary String, destination String, json String)
> COMMENT 'Employee details'
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE;
> * insert into employee_details values(100, "Salil", "100k", "Mumbai", s"""{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}""" )
> * create view employee_25 as select eid, name, `_c4` from (select eid, name, destination,v1.foo, cast(v1.bar as timestamp) from employee_details LATERAL VIEW json_tuple(json,'Foo','Bar')v1 as foo, bar)v2;
> * select * from employee_25;
> You will see an output like this:
> +------------------+-------------------+------------------+--+
> | employee_25.eid | employee_25.name | employee_25._c4 |
> +------------------+-------------------+------------------+--+
> | 100 | Salil | NULL |
> +------------------+-------------------+------------------+--+
> Now go to spark-shell and try to query the view:
> scala> spark.sql("select * from employee_25").show
> org.apache.spark.sql.AnalysisException: cannot resolve '`v2._c4`' given input columns: [foo, name, eid, bar, destination]; line 1 pos 32;
> 'Project [*]
> +- 'SubqueryAlias employee_25
> +- 'Project [eid#56, name#57, 'v2._c4]
> +- SubqueryAlias v2
> +- Project [eid#56, name#57, destination#59, foo#61, cast(bar#62 as timestamp) AS bar#63]
> +- Generate json_tuple(json#60, Foo, Bar), true, false, v1, [foo#61, bar#62]
> +- MetastoreRelation default, employee_details
> at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
> at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
> at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:308)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:308)
> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
> at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:307)
> at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:269)
> at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:279)
> at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:283)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:285)
> at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:283)
> at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$8.apply(QueryPlan.scala:288)
> at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:186)
> at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:288)
> at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:74)
> at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67)
> at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58)
> at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
> ... 48 elided
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org