You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2014/09/13 21:15:33 UTC
[jira] [Reopened] (SPARK-3414) Case insensitivity breaks when
unresolved relation contains attributes with uppercase letters in their
names
[ https://issues.apache.org/jira/browse/SPARK-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust reopened SPARK-3414:
-------------------------------------
Assignee: Michael Armbrust (was: Cheng Lian)
> Case insensitivity breaks when unresolved relation contains attributes with uppercase letters in their names
> ------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-3414
> URL: https://issues.apache.org/jira/browse/SPARK-3414
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.0.2
> Reporter: Cheng Lian
> Assignee: Michael Armbrust
> Priority: Critical
> Fix For: 1.2.0
>
>
> Paste the following snippet to {{spark-shell}} (need Hive support) to reproduce this issue:
> {code}
> import org.apache.spark.sql.hive.HiveContext
> val hiveContext = new HiveContext(sc)
> import hiveContext._
> case class LogEntry(filename: String, message: String)
> case class LogFile(name: String)
> sc.makeRDD(Seq.empty[LogEntry]).registerTempTable("rawLogs")
> sc.makeRDD(Seq.empty[LogFile]).registerTempTable("logFiles")
> val srdd = sql(
> """
> SELECT name, message
> FROM rawLogs
> JOIN (
> SELECT name
> FROM logFiles
> ) files
> ON rawLogs.filename = files.name
> """)
> srdd.registerTempTable("boom")
> sql("select * from boom")
> {code}
> Exception thrown:
> {code}
> SchemaRDD[7] at RDD at SchemaRDD.scala:103
> == Query Plan ==
> == Physical Plan ==
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: *, tree:
> Project [*]
> LowerCaseSchema
> Subquery boom
> Project ['name,'message]
> Join Inner, Some(('rawLogs.filename = name#2))
> LowerCaseSchema
> Subquery rawlogs
> SparkLogicalPlan (ExistingRdd [filename#0,message#1], MapPartitionsRDD[1] at mapPartitions at basicOperators.scala:208)
> Subquery files
> Project [name#2]
> LowerCaseSchema
> Subquery logfiles
> SparkLogicalPlan (ExistingRdd [name#2], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:208)
> {code}
> Notice that {{rawLogs}} in the join operator is not lowercased.
> The reason is that, during analysis phase, the {{CaseInsensitiveAttributeReferences}} batch is only executed before the {{Resolution}} batch. And when {{srdd}} is registered as temporary table {{boom}}, its original (unanalyzed) logical plan is stored into the catalog:
> {code}
> Join Inner, Some(('rawLogs.filename = 'files.name))
> UnresolvedRelation None, rawLogs, None
> Subquery files
> Project ['name]
> UnresolvedRelation None, logFiles, None
> {code}
> notice that attributes referenced in the join operator (esp. {{rawLogs}}) is not lowercased yet.
> And then, when {{select * from boom}} is been analyzed, its input logical plan is:
> {code}
> Project [*]
> UnresolvedRelation None, boom, None
> {code}
> here the unresolved relation points to the unanalyzed logical plan of {{srdd}} above, which is later discovered by rule {{ResolveRelations}}, thus not touched by {{CaseInsensitiveAttributeReferences}} at all, and {{rawLogs.filename}} is thus not lowercased:
> {code}
> === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations ===
> Project [*] Project [*]
> ! UnresolvedRelation None, boom, None LowerCaseSchema
> ! Subquery boom
> ! Project ['name,'message]
> ! Join Inner, Some(('rawLogs.filename = 'files.name))
> ! LowerCaseSchema
> ! Subquery rawlogs
> ! SparkLogicalPlan (ExistingRdd [filename#0,message#1], MapPartitionsRDD[1] at mapPartitions at basicOperators.scala:208)
> ! Subquery files
> ! Project ['name]
> ! LowerCaseSchema
> ! Subquery logfiles
> ! SparkLogicalPlan (ExistingRdd [name#2], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:208)
> {code}
> A reasonable fix for this could be always register analyzed logical plan to the catalog when registering temporary tables.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org