You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2016/06/02 04:58:59 UTC
[jira] [Updated] (SPARK-15620) Dataset.map creates a dataset that
can't be self-joined
[ https://issues.apache.org/jira/browse/SPARK-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan updated SPARK-15620:
--------------------------------
Assignee: Saisai Shao
> Dataset.map creates a dataset that can't be self-joined
> -------------------------------------------------------
>
> Key: SPARK-15620
> URL: https://issues.apache.org/jira/browse/SPARK-15620
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.6.1
> Environment: EC2, Spark-shell
> Reporter: Tim Gautier
> Assignee: Saisai Shao
> Fix For: 2.0.0
>
>
> Given this case class and Dataset:
> {code}
> case class Test(id: Int)
> val test = Seq(
> Test(1),
> Test(2),
> Test(3)
> ).toDS
> {code}
> 'test' can be joined with itself successfully
> {code}
> test.as("t1").joinWith(test.as("t2"), $"t1.id" === $"t2.id").show
> {code}
> However, mapping 'test' like this
> {code}
> val testMapped = test.map(t => t.copy(id = t.id + 1))
> {code}
> results in a new Dataset that can't be joined to itself
> {code}
> testMapped.as("t1").joinWith(testMapped.as("t2"), $"t1.id" === $"t2.id").show
> {code}
> Yields:
> {noformat}
> scala> testMapped.as("t1").joinWith(testMapped.as("t2"), $"t1.id" === $"t2.id").show
> org.apache.spark.sql.AnalysisException: cannot resolve 't1.id' given input columns: [id];
> {noformat}
> This also throws an error:
> {code}
> val testMapped2 = test.map(_.id)
> testMapped2.as("t1").joinWith(testMapped2.as("t2"), $"t1.value" === $"t2.value").show
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org