You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vladimir Prus (JIRA)" <ji...@apache.org> on 2019/01/30 10:53:00 UTC
[jira] [Created] (SPARK-26782) Wrong column resolved when joining
twice with the same dataframe
Vladimir Prus created SPARK-26782:
-------------------------------------
Summary: Wrong column resolved when joining twice with the same dataframe
Key: SPARK-26782
URL: https://issues.apache.org/jira/browse/SPARK-26782
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.3.1
Reporter: Vladimir Prus
# Execute the following code:
{code:java}
{
val events = Seq(("a", 0)).toDF("id", "ts")
val dim = Seq(("a", 0, 24), ("a", 24, 48)).toDF("id", "start", "end")
val dimOriginal = dim.as("dim")
val dimShifted = dim.as("dimShifted")
val r = events
.join(dimOriginal, "id")
.where(dimOriginal("start") <= $"ts" && $"ts" < dimOriginal("end"))
val r2 = r
.join(dimShifted, "id")
.where(dimShifted("start") <= $"ts" + 24 && $"ts" + 24 < dimShifted("end"))
r2.show()
r2.explain(true)
}
{code}
# Expected effect:
** One row is shown
** Logical plan shows two independent joints with "dim" and "dimShifted"
# Observed effect:
** No rows are printed.
** Logical plan shows two filters are applied:
*** 'Filter ((start#17 <= ('ts + 24)) && (('ts + 24) < end#18))'
*** Filter ((start#17 <= ts#6) && (ts#6 < end#18))
** Both these filters refer to the same start#17 and start#18 columns, so they are applied to the same dataframe, not two different ones.
** It appears that dimShifted("start") is resolved to be identical to dimOriginal("start")
# I get the desired effect if I replace the second where with
{code:java}
.where($"dimShifted.start" <= $"ts" + 24 && $"ts" + 24 < $"dimShifted.end")
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org