You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nicholas Chammas (JIRA)" <ji...@apache.org> on 2016/07/07 20:15:11 UTC
[jira] [Updated] (SPARK-15441) dataset outer join seems to return
incorrect result
[ https://issues.apache.org/jira/browse/SPARK-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicholas Chammas updated SPARK-15441:
-------------------------------------
Component/s: (was: sq;)
SQL
> dataset outer join seems to return incorrect result
> ---------------------------------------------------
>
> Key: SPARK-15441
> URL: https://issues.apache.org/jira/browse/SPARK-15441
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Reynold Xin
> Assignee: Wenchen Fan
> Priority: Critical
> Fix For: 2.0.0
>
>
> See notebook
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6122906529858466/2836020637783173/5382278320999420/latest.html
> {code}
> import org.apache.spark.sql.functions
> val left = List(("a", 1), ("a", 2), ("b", 3), ("c", 4)).toDS()
> val right = List(("a", "x"), ("b", "y"), ("d", "z")).toDS()
> // The last row _1 should be null, rather than (null, -1)
> left.toDF("k", "v").as[(String, Int)].alias("left")
> .joinWith(right.toDF("k", "u").as[(String, String)].alias("right"), functions.col("left.k") === functions.col("right.k"), "right_outer")
> .show()
> {code}
> The returned result currently is
> {code}
> +---------+-----+
> | _1| _2|
> +---------+-----+
> | (a,2)|(a,x)|
> | (a,1)|(a,x)|
> | (b,3)|(b,y)|
> |(null,-1)|(d,z)|
> +---------+-----+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org