You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marco Gaido (JIRA)" <ji...@apache.org> on 2019/01/30 10:59:00 UTC

[jira] [Resolved] (SPARK-26782) Wrong column resolved when joining twice with the same dataframe

     [ https://issues.apache.org/jira/browse/SPARK-26782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marco Gaido resolved SPARK-26782.
---------------------------------
    Resolution: Duplicate

> Wrong column resolved when joining twice with the same dataframe
> ----------------------------------------------------------------
>
>                 Key: SPARK-26782
>                 URL: https://issues.apache.org/jira/browse/SPARK-26782
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.1
>            Reporter: Vladimir Prus
>            Priority: Major
>
> # Execute the following code:
>  
> {code:java}
> {
>  val events = Seq(("a", 0)).toDF("id", "ts")
>  val dim = Seq(("a", 0, 24), ("a", 24, 48)).toDF("id", "start", "end")
>  
>  val dimOriginal = dim.as("dim")
>  val dimShifted = dim.as("dimShifted")
> val r = events
>  .join(dimOriginal, "id")
>  .where(dimOriginal("start") <= $"ts" && $"ts" < dimOriginal("end"))
> val r2 = r 
>  .join(dimShifted, "id")
>  .where(dimShifted("start") <= $"ts" + 24 && $"ts" + 24 < dimShifted("end"))
>  
>  r2.show() 
>  r2.explain(true)
> }
> {code}
>  
>  # Expected effect:
>  ** One row is shown
>  ** Logical plan shows two independent joints with "dim" and "dimShifted"
>  # Observed effect:
>  ** No rows are printed.
>  ** Logical plan shows two filters are applied:
>  *** 'Filter ((start#17 <= ('ts + 24)) && (('ts + 24) < end#18))'
>  *** Filter ((start#17 <= ts#6) && (ts#6 < end#18))
>  ** Both these filters refer to the same start#17 and start#18 columns, so they are applied to the same dataframe, not two different ones.
>  ** It appears that dimShifted("start") is resolved to be identical to dimOriginal("start")
>  # I get the desired effect if I replace the second where with 
> {code:java}
> .where($"dimShifted.start" <= $"ts" + 24 && $"ts" + 24 < $"dimShifted.end")
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org