You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Herman van Hovell (JIRA)" <ji...@apache.org> on 2016/11/28 01:11:58 UTC
[jira] [Created] (SPARK-18604) Collapse Window optimizer rule
changes column order
Herman van Hovell created SPARK-18604:
-----------------------------------------
Summary: Collapse Window optimizer rule changes column order
Key: SPARK-18604
URL: https://issues.apache.org/jira/browse/SPARK-18604
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Herman van Hovell
The recently added CollapseWindow optimizer rule changes the column order of attributes. This actually modifies the schema of the logical plan (which optimization should not do), and breaks `collect()` in a subtle way (we bind the row encoder to the output of the logical plan and not the optimized plan).
For example the following code:
{noformat}
val customers = Seq(
("Alice", "2016-05-01", 50.00),
("Alice", "2016-05-03", 45.00),
("Alice", "2016-05-04", 55.00),
("Bob", "2016-05-01", 25.00),
("Bob", "2016-05-04", 29.00),
("Bob", "2016-05-06", 27.00)).
toDF("name", "date", "amountSpent")
// Import the window functions.
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
// Create a window spec.
val wSpec1 = Window.partitionBy("name").orderBy("date").rowsBetween(-1, 1)
val df2 = customers
.withColumn("total", sum(customers("amountSpent")).over(wSpec1))
.withColumn("cnt", count(customers("amountSpent")).over(wSpec1))
{noformat}
...yields the following weird result:
{noformat}
+-----+----------+-----------+--------+-------------------+
| name| date|amountSpent| total| cnt|
+-----+----------+-----------+--------+-------------------+
| Bob|2016-05-01| 25.0|1.0E-323|4632796641680687104|
| Bob|2016-05-04| 29.0|1.5E-323|4635400285215260672|
| Bob|2016-05-06| 27.0|1.0E-323|4633078116657397760|
|Alice|2016-05-01| 50.0|1.0E-323|4636385447633747968|
|Alice|2016-05-03| 45.0|1.5E-323|4639481672377565184|
|Alice|2016-05-04| 55.0|1.0E-323|4636737291354636288|
+-----+----------+-----------+--------+-------------------+
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org