You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2016/06/02 04:58:59 UTC

[jira] [Updated] (SPARK-15620) Dataset.map creates a dataset that can't be self-joined

     [ https://issues.apache.org/jira/browse/SPARK-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan updated SPARK-15620:
--------------------------------
    Assignee: Saisai Shao

> Dataset.map creates a dataset that can't be self-joined
> -------------------------------------------------------
>
>                 Key: SPARK-15620
>                 URL: https://issues.apache.org/jira/browse/SPARK-15620
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.1
>         Environment: EC2, Spark-shell
>            Reporter: Tim Gautier
>            Assignee: Saisai Shao
>             Fix For: 2.0.0
>
>
> Given this case class and Dataset:
> {code}
> case class Test(id: Int)
> val test = Seq(
>   Test(1),
>   Test(2),
>   Test(3)
> ).toDS
> {code}
> 'test' can be joined with itself successfully
> {code}
> test.as("t1").joinWith(test.as("t2"), $"t1.id" === $"t2.id").show
> {code}
> However, mapping 'test' like this
> {code}
> val testMapped = test.map(t => t.copy(id = t.id + 1))
> {code}
> results in a new Dataset that can't be joined to itself
> {code}
> testMapped.as("t1").joinWith(testMapped.as("t2"), $"t1.id" === $"t2.id").show
> {code}
> Yields:
> {noformat}
> scala> testMapped.as("t1").joinWith(testMapped.as("t2"), $"t1.id" === $"t2.id").show
> org.apache.spark.sql.AnalysisException: cannot resolve 't1.id' given input columns: [id];
> {noformat}
> This also throws an error:
> {code}
> val testMapped2 = test.map(_.id)
> testMapped2.as("t1").joinWith(testMapped2.as("t2"), $"t1.value" === $"t2.value").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org