You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "MIK (JIRA)" <ji...@apache.org> on 2018/08/08 21:33:00 UTC

[jira] [Comment Edited] (SPARK-25051) where clause on dataset gives AnalysisException

    [ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573898#comment-16573898 ] 

MIK edited comment on SPARK-25051 at 8/8/18 9:32 PM:
-----------------------------------------------------

df1 and df2, both are reading from S3 files

df1 = spark.read.format("csv").option("header", "false").
                 option("codec", "org.apache.hadoop.io.compress.GzipCodec").
                 option("sep", "\t").schema(schema).load(datafile)


was (Author: mik1007):
df1 and df2, both are reading from S3 files

df1 = spark.read.format("csv").option("header", "false").
                option("codec", "org.apache.hadoop.io.compress.GzipCodec").
                option("sep", "\t").schema(appUsageSchema).load(datafile)

> where clause on dataset gives AnalysisException
> -----------------------------------------------
>
>                 Key: SPARK-25051
>                 URL: https://issues.apache.org/jira/browse/SPARK-25051
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.3.0
>            Reporter: MIK
>            Priority: Major
>
> *schemas :*
> df1
> => id ts
> df2
> => id name country
> *code:*
> val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull)
> *error*:
> org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in operator !Filter isnull(id#0). Attribute(s) with the same name appear in the operation: id. Please check if the right attribute(s) are used.;;
>  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
>     at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
>     at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289)
>     at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
>     at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>     at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
>     at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
>     at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
>     at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
>     at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
>     at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
>     at org.apache.spark.sql.Dataset.<init>(Dataset.scala:172)
>     at org.apache.spark.sql.Dataset.<init>(Dataset.scala:178)
>     at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
>     at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
>     at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458)
>     at org.apache.spark.sql.Dataset.where(Dataset.scala:1486)
> This works fine in spark 2.2.2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org