You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liwei Lin (JIRA)" <ji...@apache.org> on 2016/07/12 23:41:20 UTC
[jira] [Commented] (SPARK-16506) Subsequent dataframe join dont work

    [ https://issues.apache.org/jira/browse/SPARK-16506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373963#comment-15373963 ] 

Liwei Lin commented on SPARK-16506:
-----------------------------------

Hi [~timotta], thanks for reporting this. This can be reproduced in 2.0 as well. Let me take a look into it and submit a patch. Thanks.

> Subsequent dataframe join dont work
> -----------------------------------
>
>                 Key: SPARK-16506
>                 URL: https://issues.apache.org/jira/browse/SPARK-16506
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2
>            Reporter: Tiago Albineli Motta
>            Priority: Minor
>              Labels: bug, dataframe, error, join, joins, sql
>
> Here is the example code:
> {quote}
>       import sql.implicits._
>       val objs = sc.parallelize(Seq(("1", "um"), ("2", "dois"), ("3", "tres"))).toDF.selectExpr("_1 as id", "_2 as name")
>       
>       val rawj = sc.parallelize(Seq(("1", "2"),  ("1", "3"), ("2", "3"), ("2", "1"))).toDF.selectExpr("_1 as id1", "_2 as id2")
>       
>       val join1 = rawj.join(objs, objs("id") === rawj("id1"))
>         .withColumnRenamed("id", "anything")
>         
>       println("works...")
>       val join2a = join1.join(objs, 'id2 === 'id )
>       join2a.show()
>       
>       println("works...")
>       val join2b = objs.join(join1, objs("id") === join1("id2"))
>       join2b.show()
>       
>       println("do not works...")
>       val join2c = join1.join(objs, join1("id2") === objs("id") )
>       join2c.show()
> {quote}
> Fisrt two joins work. But the last one gave me this error:
> {quote}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved attribute(s) id#2 missing from anything#8,name#14,name#3,id1#6,id2#7,id#13 in operator !Join Inner, Some((id2#7 = id#2));
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:183)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:105)
> {quote}
> Without the first column rename, the error happens in silence since the join get empty:
> {quote}
>       import sql.implicits._
>       val objs = sc.parallelize(Seq(("1", "um"), ("2", "dois"), ("3", "tres"))).toDF.selectExpr("_1 as id", "_2 as name")
>       
>       val rawj = sc.parallelize(Seq(("1", "2"),  ("1", "3"), ("2", "3"), ("2", "1"))).toDF.selectExpr("_1 as id1", "_2 as id2")
>       
>       val join1 = rawj.join(objs, objs("id") === rawj("id1"))
>       
>       println("do not works...")
>       val join2c = join1.join(objs, join1("id2") === objs("id") )
>       join2c.show()
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org