You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Herman van Hovell (JIRA)" <ji...@apache.org> on 2016/11/28 01:11:58 UTC

[jira] [Created] (SPARK-18604) Collapse Window optimizer rule changes column order

Herman van Hovell created SPARK-18604:
-----------------------------------------

             Summary: Collapse Window optimizer rule changes column order
                 Key: SPARK-18604
                 URL: https://issues.apache.org/jira/browse/SPARK-18604
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Herman van Hovell


The recently added CollapseWindow optimizer rule changes the column order of attributes. This actually modifies the schema of the logical plan (which optimization should not do), and breaks `collect()` in a subtle way (we bind the row encoder to the output of the logical plan and not the optimized plan). 

For example the following code:
{noformat}
val customers = Seq(
  ("Alice", "2016-05-01", 50.00),
  ("Alice", "2016-05-03", 45.00),
  ("Alice", "2016-05-04", 55.00),
  ("Bob", "2016-05-01", 25.00),
  ("Bob", "2016-05-04", 29.00),
  ("Bob", "2016-05-06", 27.00)).
  toDF("name", "date", "amountSpent")
 
// Import the window functions.
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
 
// Create a window spec.
val wSpec1 = Window.partitionBy("name").orderBy("date").rowsBetween(-1, 1)
val df2 = customers
  .withColumn("total", sum(customers("amountSpent")).over(wSpec1))
  .withColumn("cnt", count(customers("amountSpent")).over(wSpec1))
{noformat}
...yields the following weird result:
{noformat}
+-----+----------+-----------+--------+-------------------+
| name|      date|amountSpent|   total|                cnt|
+-----+----------+-----------+--------+-------------------+
|  Bob|2016-05-01|       25.0|1.0E-323|4632796641680687104|
|  Bob|2016-05-04|       29.0|1.5E-323|4635400285215260672|
|  Bob|2016-05-06|       27.0|1.0E-323|4633078116657397760|
|Alice|2016-05-01|       50.0|1.0E-323|4636385447633747968|
|Alice|2016-05-03|       45.0|1.5E-323|4639481672377565184|
|Alice|2016-05-04|       55.0|1.0E-323|4636737291354636288|
+-----+----------+-----------+--------+-------------------+
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org