You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by guowei2 <gi...@git.apache.org> on 2015/04/21 08:07:25 UTC

[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

GitHub user guowei2 opened a pull request:

    https://github.com/apache/spark/pull/5604

    [SPARK-1442][SQL][WIP] Window Function Support for Spart SQL

    The PR is WIP now for NPE when generating golden answer

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/guowei2/spark windowImplement

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5604.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5604
    
----
commit 31397d1f50922d1dfa27af6bf4559a6f0080c7cc
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-03-04T03:39:38Z

    window function

commit ea8a9e4dc08026292a40c4d95cff55117ea98ac6
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-03-27T07:41:22Z

    reflect window function

commit 42de8f907daea4a4dfede306d8c3b5cd5b4f99e8
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-03-30T05:11:46Z

    code style

commit cd2b54415e94e901ee6c3924551b6c9514f1e2fe
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-03-30T10:19:17Z

    sql parser support to use spark own aggregate functions

commit 4ab160c4b8d771b554543002d242ff8cb6246673
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-03-31T08:01:47Z

    check whether to sort by other key in one partition

commit 8ec5bc96b73cb8bb17af2b41a075017ca24a26f9
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-04-02T03:16:50Z

    fix window with group by

commit cd8a0bc0eaff7a6e0247bff5480ea0b4a08b6ba9
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-04-03T02:24:49Z

    fix rank dese_rank lead lag

commit 33bebb680f980786326d919baf1a44981dc274c7
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-04-03T03:17:57Z

    fix rank dese_rank lead lag

commit af3b618494bf5944f4dfeb68b352112ec4fe34fc
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-04-07T07:13:26Z

    fix bug

commit e1f2e9578596288246363f90f30913034a9bd7d1
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-04-07T07:37:21Z

    fix rebase issue

commit 6858572d5a5bf6a4e81461debb713a13c66ff104
Author: guowei2 <gu...@asiainfo.com>
Date:   2015-04-21T05:20:37Z

    remove test suite temporarily for NPE when generating golden answer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95982809
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30940/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-97521503
  
    Guys, thank you for your comments. I am updating this PR now. I should have a update later today. Will reply you guys later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95774445
  
      [Test build #30900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30900/consoleFull) for   PR 5604 at commit [`b4fa747`](https://github.com/apache/spark/commit/b4fa74776f6ea767e160f81d93ae9ab49f0b88c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-97671446
  
    I have been working on it a few days. I believe that my update will cover most of your comments. Please hold your comments until my update. Thanks :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL] Window Function Support for ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r55269635
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -195,9 +202,28 @@ case class Aggregate(
         child: LogicalPlan)
       extends UnaryNode {
     
    +  override lazy val resolved: Boolean = {
    +    val hasWindowExpressions = aggregateExpressions.exists ( _.collect {
    +        case window: WindowExpression => window
    +      }.nonEmpty
    +    )
    +
    +    !expressions.exists(!_.resolved) && childrenResolved && !hasWindowExpressions
    +  }
    +
       override def output: Seq[Attribute] = aggregateExpressions.map(_.toAttribute)
     }
     
    +case class Window(
    +    projectList: Seq[Attribute],
    --- End diff --
    
    Can any of you submit a PR to make the change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL] Window Function Support for ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r55142233
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -195,9 +202,28 @@ case class Aggregate(
         child: LogicalPlan)
       extends UnaryNode {
     
    +  override lazy val resolved: Boolean = {
    +    val hasWindowExpressions = aggregateExpressions.exists ( _.collect {
    +        case window: WindowExpression => window
    +      }.nonEmpty
    +    )
    +
    +    !expressions.exists(!_.resolved) && childrenResolved && !hasWindowExpressions
    +  }
    +
       override def output: Seq[Attribute] = aggregateExpressions.map(_.toAttribute)
     }
     
    +case class Window(
    +    projectList: Seq[Attribute],
    --- End diff --
    
    I have the same question.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL] Window Function Support for ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r55140466
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -195,9 +202,28 @@ case class Aggregate(
         child: LogicalPlan)
       extends UnaryNode {
     
    +  override lazy val resolved: Boolean = {
    +    val hasWindowExpressions = aggregateExpressions.exists ( _.collect {
    +        case window: WindowExpression => window
    +      }.nonEmpty
    +    )
    +
    +    !expressions.exists(!_.resolved) && childrenResolved && !hasWindowExpressions
    +  }
    +
       override def output: Seq[Attribute] = aggregateExpressions.map(_.toAttribute)
     }
     
    +case class Window(
    +    projectList: Seq[Attribute],
    --- End diff --
    
    Why we need a `projectList` in `Window`? Isn't it always equal to `child.output`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95782466
  
      [Test build #30904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30904/consoleFull) for   PR 5604 at commit [`cae7079`](https://github.com/apache/spark/commit/cae7079fd7828ea84dddf654a272640f08276a0e).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression `
      * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])`
      * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])`
      * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)`
      * `case class WindowAggregate(`
      * `case class WindowAggregate(`
      * `  case class ComputedWindow(`
      * `  case class WindowFunctionInfo(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94652284
  
      [Test build #30643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30643/consoleFull) for   PR 5604 at commit [`6858572`](https://github.com/apache/spark/commit/6858572d5a5bf6a4e81461debb713a13c66ff104).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95982770
  
      [Test build #30940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30940/consoleFull) for   PR 5604 at commit [`4bb2c70`](https://github.com/apache/spark/commit/4bb2c70c17ff8fe6ddcf77d6e74c5316334624cc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression `
      * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])`
      * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])`
      * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)`
      * `case class WindowAggregate(`
      * `case class WindowAggregate(`
      * `  case class ComputedWindow(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94714530
  
      [Test build #30652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30652/consoleFull) for   PR 5604 at commit [`4453aff`](https://github.com/apache/spark/commit/4453aff95a0fff2a3607b8d82376c474859850a2).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression `
      * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])`
      * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])`
      * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)`
      * `case class WindowAggregate(`
      * `case class WindowAggregate(`
      * `  case class ComputedWindow(`
      * `  case class WindowFunctionInfo(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #5604: [SPARK-1442][SQL] Window Function Support for Spar...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r157945816
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ---
    @@ -0,0 +1,340 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import org.apache.spark.sql.catalyst.analysis.UnresolvedException
    +import org.apache.spark.sql.catalyst.errors.TreeNodeException
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types.{NumericType, DataType}
    +
    +/**
    + * The trait of the Window Specification (specified in the OVER clause or WINDOW clause) for
    + * Window Functions.
    + */
    +sealed trait WindowSpec
    +
    +/**
    + * The specification for a window function.
    + * @param partitionSpec It defines the way that input rows are partitioned.
    + * @param orderSpec It defines the ordering of rows in a partition.
    + * @param frameSpecification It defines the window frame in a partition.
    + */
    +case class WindowSpecDefinition(
    +    partitionSpec: Seq[Expression],
    +    orderSpec: Seq[SortOrder],
    +    frameSpecification: WindowFrame) extends Expression with WindowSpec {
    +
    +  def validate: Option[String] = frameSpecification match {
    +    case UnspecifiedFrame =>
    +      Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " +
    +        "during analysis. Please file a bug report.")
    +    case frame: SpecifiedWindowFrame => frame.validate.orElse {
    +      def checkValueBasedBoundaryForRangeFrame(): Option[String] = {
    +        if (orderSpec.length > 1)  {
    +          // It is not allowed to have a value-based PRECEDING and FOLLOWING
    +          // as the boundary of a Range Window Frame.
    +          Some("This Range Window Frame only accepts at most one ORDER BY expression.")
    +        } else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) {
    +          Some("The data type of the expression in the ORDER BY clause should be a numeric type.")
    +        } else {
    +          None
    +        }
    +      }
    +
    +      (frame.frameType, frame.frameStart, frame.frameEnd) match {
    +        case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame()
    +        case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame()
    +        case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame()
    +        case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame()
    +        case (_, _, _) => None
    +      }
    +    }
    +  }
    +
    +  type EvaluatedType = Any
    +
    +  override def children: Seq[Expression]  = partitionSpec ++ orderSpec
    +
    +  override lazy val resolved: Boolean =
    +    childrenResolved && frameSpecification.isInstanceOf[SpecifiedWindowFrame]
    +
    +
    +  override def toString: String = simpleString
    +
    +  override def eval(input: Row): EvaluatedType = throw new UnsupportedOperationException
    +  override def nullable: Boolean = true
    +  override def foldable: Boolean = false
    +  override def dataType: DataType = throw new UnsupportedOperationException
    +}
    +
    +/**
    + * A Window specification reference that refers to the [[WindowSpecDefinition]] defined
    + * under the name `name`.
    + */
    +case class WindowSpecReference(name: String) extends WindowSpec
    +
    +/**
    + * The trait used to represent the type of a Window Frame.
    + */
    +sealed trait FrameType
    +
    +/**
    + * RowFrame treats rows in a partition individually. When a [[ValuePreceding]]
    + * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered
    + * as a physical offset.
    + * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame,
    + * from the row precedes the current row to the row follows the current row.
    + */
    +case object RowFrame extends FrameType
    +
    +/**
    + * RangeFrame treats rows in a partition as groups of peers.
    + * All rows having the same `ORDER BY` ordering are considered as peers.
    + * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]],
    + * the value is considered as a logical offset.
    + * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`,
    + * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values
    + * `expr` are in the range of [v-1, v+1].
    + *
    + * If `ORDER BY` clause is not defined, all rows in the partition is considered as peers
    + * of the current row.
    + */
    +case object RangeFrame extends FrameType
    +
    +/**
    + * The trait used to represent the type of a Window Frame Boundary.
    + */
    +sealed trait FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean
    +}
    +
    +/** UNBOUNDED PRECEDING boundary. */
    +case object UnboundedPreceding extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => true
    +    case vp: ValuePreceding => true
    +    case CurrentRow => true
    +    case vf: ValueFollowing => true
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = "UNBOUNDED PRECEDING"
    +}
    +
    +/** <value> PRECEDING boundary. */
    +case class ValuePreceding(value: Int) extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case ValuePreceding(anotherValue) => value >= anotherValue
    +    case CurrentRow => true
    +    case vf: ValueFollowing => true
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = s"$value PRECEDING"
    +}
    +
    +/** CURRENT ROW boundary. */
    +case object CurrentRow extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case vp: ValuePreceding => false
    +    case CurrentRow => true
    +    case vf: ValueFollowing => true
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = "CURRENT ROW"
    +}
    +
    +/** <value> FOLLOWING boundary. */
    +case class ValueFollowing(value: Int) extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case vp: ValuePreceding => false
    +    case CurrentRow => false
    +    case ValueFollowing(anotherValue) => value <= anotherValue
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = s"$value FOLLOWING"
    +}
    +
    +/** UNBOUNDED FOLLOWING boundary. */
    +case object UnboundedFollowing extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case vp: ValuePreceding => false
    +    case CurrentRow => false
    +    case vf: ValueFollowing => false
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = "UNBOUNDED FOLLOWING"
    +}
    +
    +/**
    + * The trait used to represent the a Window Frame.
    + */
    +sealed trait WindowFrame
    +
    +/** Used as a place holder when a frame specification is not defined.  */
    +case object UnspecifiedFrame extends WindowFrame
    +
    +/** A specified Window Frame. */
    +case class SpecifiedWindowFrame(
    +    frameType: FrameType,
    +    frameStart: FrameBoundary,
    +    frameEnd: FrameBoundary) extends WindowFrame {
    +
    +  /** If this WindowFrame is valid or not. */
    +  def validate: Option[String] = (frameType, frameStart, frameEnd) match {
    +    case (_, UnboundedFollowing, _) =>
    +      Some(s"$UnboundedFollowing is not allowed as the start of a Window Frame.")
    +    case (_, _, UnboundedPreceding) =>
    +      Some(s"$UnboundedPreceding is not allowed as the end of a Window Frame.")
    +    // case (RowFrame, start, end) => ??? RowFrame specific rule
    +    // case (RangeFrame, start, end) => ??? RangeFrame specific rule
    +    case (_, start, end) =>
    +      if (start.notFollows(end)) {
    +        None
    +      } else {
    +        val reason =
    +          s"The end of this Window Frame $end is smaller than the start of " +
    +          s"this Window Frame $start."
    +        Some(reason)
    +      }
    +  }
    +
    +  override def toString: String = frameType match {
    +    case RowFrame => s"ROWS BETWEEN $frameStart AND $frameEnd"
    +    case RangeFrame => s"RANGE BETWEEN $frameStart AND $frameEnd"
    +  }
    +}
    +
    +object SpecifiedWindowFrame {
    +  /**
    +   *
    +   * @param hasOrderSpecification If the window spec has order by expressions.
    +   * @param acceptWindowFrame If the window function accepts user-specified frame.
    +   * @return
    +   */
    +  def defaultWindowFrame(
    +      hasOrderSpecification: Boolean,
    +      acceptWindowFrame: Boolean): SpecifiedWindowFrame = {
    +    if (hasOrderSpecification && acceptWindowFrame) {
    --- End diff --
    
    Thanks for the explanation!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94652415
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30643/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94652412
  
      [Test build #30643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30643/consoleFull) for   PR 5604 at commit [`6858572`](https://github.com/apache/spark/commit/6858572d5a5bf6a4e81461debb713a13c66ff104).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression `
      * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])`
      * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])`
      * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)`
      * `case class WindowAggregate(`
      * `case class WindowAggregate(`
      * `  case class ComputedWindow(`
      * `  case class WindowFunctionInfo(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-96766834
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by hbutani <gi...@git.apache.org>.
Github user hbutani commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-96220742
  
    Hi,
    
    I spent some time on this patch. This is a good start.
    But there are several Semantic issues. And I have some
    comments/suggestions about the execution. Hope you don't mind my comments.
    
    Semantic Issues:
    1. The order by clause in windowing implies an ordering within each logical partition. Strictly speaking doing a global order(across all logical partitions within a physical partition, i.e. all rows that end up at the same node) is not quite the same; specially when you have rows with the same values(on the order expressions) across partitions. This way of implementing also forces you to read all the rows in the WindowAggregate execution before apply any aggregations. This precludes you from doing any 'streaming' style execution. Effectively you are repartitioning in WindowAggregate, albeit within each WindowAggregate invocation. More on this below, in Execution comments.
    
    2. 'Current Row' doesn't mean an offset of '0', for Range based windowing. It implies including Rows before or after (depending on the direction of the boundary) that have the same Order expression values.
    
    3. No 'partition clause' doesn't imply no partitioning; it implies that all rows should be treated as 1 partition.
    
    Execution comments/suggestions:
    1. You want to consider holding onto as few rows as possible. There are several things to consider:
      a. A relatively easy first step is to only read 1 logical partition at a time. But this would require the rows coming in are in the correct order.
      b. My suggestion would be to introduce UDAF level streaming as early as possible. This way for many common cases the memory footprint can be very low. See the work done in hive around 'org.apache.hadoop.hive.ql.udf.generic.ISupportStreamingModeForWindowing' This also has a big performance impact; in hive we encountered  users who work with partitions of considerable size(10s of thousands); the O(n*n) straightforward  way of computing window aggregates was a huge bottleneck.
    
    2. Why are you translating all Window invocations to HiveUDAF? This way you are taking a hit to go from spark to hive values and back. At least for the AggFunctions already in Spark,why not use them. 
    
    3. The Exchange Operator has a parameter to specify ordering within a partition. Is there a way to generate a physical plan that utilizes this. You wouldn't need a Sort operator above the Exchange then. (I am still learning about Spark-Sql, so I am not sure on the implication of this) 
    
    regards,
    Harish Butani.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94666649
  
      [Test build #30652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30652/consoleFull) for   PR 5604 at commit [`4453aff`](https://github.com/apache/spark/commit/4453aff95a0fff2a3607b8d82376c474859850a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95868922
  
      [Test build #30928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30928/consoleFull) for   PR 5604 at commit [`d07101b`](https://github.com/apache/spark/commit/d07101bd5a6f3b30532c4d4d77ab8d310607b684).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression `
      * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])`
      * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])`
      * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)`
      * `case class WindowAggregate(`
      * `case class WindowAggregate(`
      * `  case class ComputedWindow(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #5604: [SPARK-1442][SQL] Window Function Support for Spar...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r157931911
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ---
    @@ -0,0 +1,340 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import org.apache.spark.sql.catalyst.analysis.UnresolvedException
    +import org.apache.spark.sql.catalyst.errors.TreeNodeException
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types.{NumericType, DataType}
    +
    +/**
    + * The trait of the Window Specification (specified in the OVER clause or WINDOW clause) for
    + * Window Functions.
    + */
    +sealed trait WindowSpec
    +
    +/**
    + * The specification for a window function.
    + * @param partitionSpec It defines the way that input rows are partitioned.
    + * @param orderSpec It defines the ordering of rows in a partition.
    + * @param frameSpecification It defines the window frame in a partition.
    + */
    +case class WindowSpecDefinition(
    +    partitionSpec: Seq[Expression],
    +    orderSpec: Seq[SortOrder],
    +    frameSpecification: WindowFrame) extends Expression with WindowSpec {
    +
    +  def validate: Option[String] = frameSpecification match {
    +    case UnspecifiedFrame =>
    +      Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " +
    +        "during analysis. Please file a bug report.")
    +    case frame: SpecifiedWindowFrame => frame.validate.orElse {
    +      def checkValueBasedBoundaryForRangeFrame(): Option[String] = {
    +        if (orderSpec.length > 1)  {
    +          // It is not allowed to have a value-based PRECEDING and FOLLOWING
    +          // as the boundary of a Range Window Frame.
    +          Some("This Range Window Frame only accepts at most one ORDER BY expression.")
    +        } else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) {
    +          Some("The data type of the expression in the ORDER BY clause should be a numeric type.")
    +        } else {
    +          None
    +        }
    +      }
    +
    +      (frame.frameType, frame.frameStart, frame.frameEnd) match {
    +        case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame()
    +        case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame()
    +        case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame()
    +        case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame()
    +        case (_, _, _) => None
    +      }
    +    }
    +  }
    +
    +  type EvaluatedType = Any
    +
    +  override def children: Seq[Expression]  = partitionSpec ++ orderSpec
    +
    +  override lazy val resolved: Boolean =
    +    childrenResolved && frameSpecification.isInstanceOf[SpecifiedWindowFrame]
    +
    +
    +  override def toString: String = simpleString
    +
    +  override def eval(input: Row): EvaluatedType = throw new UnsupportedOperationException
    +  override def nullable: Boolean = true
    +  override def foldable: Boolean = false
    +  override def dataType: DataType = throw new UnsupportedOperationException
    +}
    +
    +/**
    + * A Window specification reference that refers to the [[WindowSpecDefinition]] defined
    + * under the name `name`.
    + */
    +case class WindowSpecReference(name: String) extends WindowSpec
    +
    +/**
    + * The trait used to represent the type of a Window Frame.
    + */
    +sealed trait FrameType
    +
    +/**
    + * RowFrame treats rows in a partition individually. When a [[ValuePreceding]]
    + * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered
    + * as a physical offset.
    + * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame,
    + * from the row precedes the current row to the row follows the current row.
    + */
    +case object RowFrame extends FrameType
    +
    +/**
    + * RangeFrame treats rows in a partition as groups of peers.
    + * All rows having the same `ORDER BY` ordering are considered as peers.
    + * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]],
    + * the value is considered as a logical offset.
    + * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`,
    + * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values
    + * `expr` are in the range of [v-1, v+1].
    + *
    + * If `ORDER BY` clause is not defined, all rows in the partition is considered as peers
    + * of the current row.
    + */
    +case object RangeFrame extends FrameType
    +
    +/**
    + * The trait used to represent the type of a Window Frame Boundary.
    + */
    +sealed trait FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean
    +}
    +
    +/** UNBOUNDED PRECEDING boundary. */
    +case object UnboundedPreceding extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => true
    +    case vp: ValuePreceding => true
    +    case CurrentRow => true
    +    case vf: ValueFollowing => true
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = "UNBOUNDED PRECEDING"
    +}
    +
    +/** <value> PRECEDING boundary. */
    +case class ValuePreceding(value: Int) extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case ValuePreceding(anotherValue) => value >= anotherValue
    +    case CurrentRow => true
    +    case vf: ValueFollowing => true
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = s"$value PRECEDING"
    +}
    +
    +/** CURRENT ROW boundary. */
    +case object CurrentRow extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case vp: ValuePreceding => false
    +    case CurrentRow => true
    +    case vf: ValueFollowing => true
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = "CURRENT ROW"
    +}
    +
    +/** <value> FOLLOWING boundary. */
    +case class ValueFollowing(value: Int) extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case vp: ValuePreceding => false
    +    case CurrentRow => false
    +    case ValueFollowing(anotherValue) => value <= anotherValue
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = s"$value FOLLOWING"
    +}
    +
    +/** UNBOUNDED FOLLOWING boundary. */
    +case object UnboundedFollowing extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case vp: ValuePreceding => false
    +    case CurrentRow => false
    +    case vf: ValueFollowing => false
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = "UNBOUNDED FOLLOWING"
    +}
    +
    +/**
    + * The trait used to represent the a Window Frame.
    + */
    +sealed trait WindowFrame
    +
    +/** Used as a place holder when a frame specification is not defined.  */
    +case object UnspecifiedFrame extends WindowFrame
    +
    +/** A specified Window Frame. */
    +case class SpecifiedWindowFrame(
    +    frameType: FrameType,
    +    frameStart: FrameBoundary,
    +    frameEnd: FrameBoundary) extends WindowFrame {
    +
    +  /** If this WindowFrame is valid or not. */
    +  def validate: Option[String] = (frameType, frameStart, frameEnd) match {
    +    case (_, UnboundedFollowing, _) =>
    +      Some(s"$UnboundedFollowing is not allowed as the start of a Window Frame.")
    +    case (_, _, UnboundedPreceding) =>
    +      Some(s"$UnboundedPreceding is not allowed as the end of a Window Frame.")
    +    // case (RowFrame, start, end) => ??? RowFrame specific rule
    +    // case (RangeFrame, start, end) => ??? RangeFrame specific rule
    +    case (_, start, end) =>
    +      if (start.notFollows(end)) {
    +        None
    +      } else {
    +        val reason =
    +          s"The end of this Window Frame $end is smaller than the start of " +
    +          s"this Window Frame $start."
    +        Some(reason)
    +      }
    +  }
    +
    +  override def toString: String = frameType match {
    +    case RowFrame => s"ROWS BETWEEN $frameStart AND $frameEnd"
    +    case RangeFrame => s"RANGE BETWEEN $frameStart AND $frameEnd"
    +  }
    +}
    +
    +object SpecifiedWindowFrame {
    +  /**
    +   *
    +   * @param hasOrderSpecification If the window spec has order by expressions.
    +   * @param acceptWindowFrame If the window function accepts user-specified frame.
    +   * @return
    +   */
    +  def defaultWindowFrame(
    +      hasOrderSpecification: Boolean,
    +      acceptWindowFrame: Boolean): SpecifiedWindowFrame = {
    +    if (hasOrderSpecification && acceptWindowFrame) {
    --- End diff --
    
    Do you know why the default window frame could be different due to order spec here? Is that for some kind of compatibility issues? cc @cloud-fan @gatorsmile @yhuai 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL] Window Function Support for ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r55268103
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -195,9 +202,28 @@ case class Aggregate(
         child: LogicalPlan)
       extends UnaryNode {
     
    +  override lazy val resolved: Boolean = {
    +    val hasWindowExpressions = aggregateExpressions.exists ( _.collect {
    +        case window: WindowExpression => window
    +      }.nonEmpty
    +    )
    +
    +    !expressions.exists(!_.resolved) && childrenResolved && !hasWindowExpressions
    +  }
    +
       override def output: Seq[Attribute] = aggregateExpressions.map(_.toAttribute)
     }
     
    +case class Window(
    +    projectList: Seq[Attribute],
    --- End diff --
    
    Looks like we do not actually need `projectList` since it is always `child.output`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95780263
  
    Can you merge https://github.com/guowei2/spark/pull/3/files?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r29012757
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
    @@ -1051,6 +1060,168 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
           throw new NotImplementedError(s"No parse rules for:\n ${dumpTree(a).toString} ")
       }
     
    +  protected val windowDefs = new ThreadLocal[Map[String, Seq[ASTNode]]] {
    +    override def initialValue() = Map.empty[String, Seq[ASTNode]]
    +  }
    --- End diff --
    
    It would be great if we can remove this `ThreadLocal`. But if it requires too much work, I guess it's OK to leave it as is. `ThreadLocal` can always be error prone.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95782172
  
      [Test build #30904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30904/consoleFull) for   PR 5604 at commit [`cae7079`](https://github.com/apache/spark/commit/cae7079fd7828ea84dddf654a272640f08276a0e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94655371
  
      [Test build #30649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30649/consoleFull) for   PR 5604 at commit [`d5c980f`](https://github.com/apache/spark/commit/d5c980f157e77f329062f07d9268b1aa1f4dbb19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95809888
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30917/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94655848
  
      [Test build #30649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30649/consoleFull) for   PR 5604 at commit [`d5c980f`](https://github.com/apache/spark/commit/d5c980f157e77f329062f07d9268b1aa1f4dbb19).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression `
      * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])`
      * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])`
      * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)`
      * `case class WindowAggregate(`
      * `case class WindowAggregate(`
      * `  case class ComputedWindow(`
      * `  case class WindowFunctionInfo(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-97662163
  
    we'd better make a new logical plan named `windows` just like `with`, and do the transform work such as  `windowToPlan` in analyzer instead of adding this logical to hiveql since
    1 hiveql.scala now is much a big object, adding more logical make it hard to maintain
    2 a windows logical plan is useful, we do not want do the same work for window function support in all sql parsers(after the pluggable parser pr in, we can plug in new parser)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94655851
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30649/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95782470
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30904/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95745545
  
    Hey @guowei2, "Spart" => "Spark" in the PR title :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r29013060
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
    @@ -736,32 +741,36 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
             // The projection of the query can either be a normal projection, an aggregation
             // (if there is a group by) or a script transformation.
             val withProject: LogicalPlan = transformation.getOrElse {
    -          val selectExpressions = 
    +          val selectExpressions =
                 nameExpressions(select.getChildren.flatMap(selExprNodeToExpr).toSeq)
    -          Seq(
    -            groupByClause.map(e => e match {
    -              case Token("TOK_GROUPBY", children) =>
    -                // Not a transformation so must be either project or aggregation.
    -                Aggregate(children.map(nodeToExpr), selectExpressions, withLateralView)
    -              case _ => sys.error("Expect GROUP BY")
    -            }),
    -            groupingSetsClause.map(e => e match {
    -              case Token("TOK_GROUPING_SETS", children) =>
    -                val(groupByExprs, masks) = extractGroupingSet(children)
    -                GroupingSets(masks, groupByExprs, withLateralView, selectExpressions)
    -              case _ => sys.error("Expect GROUPING SETS")
    -            }),
    -            rollupGroupByClause.map(e => e match {
    -              case Token("TOK_ROLLUP_GROUPBY", children) =>
    -                Rollup(children.map(nodeToExpr), withLateralView, selectExpressions)
    -              case _ => sys.error("Expect WITH ROLLUP")
    -            }),
    -            cubeGroupByClause.map(e => e match {
    -              case Token("TOK_CUBE_GROUPBY", children) =>
    -                Cube(children.map(nodeToExpr), withLateralView, selectExpressions)
    -              case _ => sys.error("Expect WITH CUBE")
    -            }), 
    -            Some(Project(selectExpressions, withLateralView))).flatten.head
    +
    +          val groupPlan = (selectExprs: Seq[NamedExpression]) =>
    +            Seq(
    +              groupByClause.map(e => e match {
    +                case Token("TOK_GROUPBY", children) =>
    +                  // Not a transformation so must be either project or aggregation.
    +                  Aggregate(children.map(nodeToExpr), selectExprs, withLateralView)
    +                case _ => sys.error("Expect GROUP BY")
    +              }),
    --- End diff --
    
    Actually you can simplify this to:
    
    ```scala
    groupByClause.map {
      case Token("TOK_GROUPBY", children) =>
         ...
      case _ => ...
    }
    ```
    
    Also applies to other parts below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94651361
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL] Window Function Support for ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r55272651
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -195,9 +202,28 @@ case class Aggregate(
         child: LogicalPlan)
       extends UnaryNode {
     
    +  override lazy val resolved: Boolean = {
    +    val hasWindowExpressions = aggregateExpressions.exists ( _.collect {
    +        case window: WindowExpression => window
    +      }.nonEmpty
    +    )
    +
    +    !expressions.exists(!_.resolved) && childrenResolved && !hasWindowExpressions
    +  }
    +
       override def output: Seq[Attribute] = aggregateExpressions.map(_.toAttribute)
     }
     
    +case class Window(
    +    projectList: Seq[Attribute],
    --- End diff --
    
    Sure, will do it today. @yhuai  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95965397
  
      [Test build #30940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30940/consoleFull) for   PR 5604 at commit [`4bb2c70`](https://github.com/apache/spark/commit/4bb2c70c17ff8fe6ddcf77d6e74c5316334624cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95878550
  
    @guowei2 , can you generate golden answer for this locally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-97518465
  
    Hi,
    
    I have been experimenting with Window functions in Spark SQL as well. It has been partially based on this. You can find my work [here](https://github.com/hvanhovell/spark-window).
    
    I have deviated from the original implementation in couple of ways:
    - Implemented it as an extension to Spark SQL and not Hive. All aggregates use the Spark SQL implementations (not the Hive UDAFs).
    - Use of SPARK 1.4 child ordering requirements. Sorting is planned by the engine; this will especially interesting as soon as exchange will start supporting secondary sorting. I have tried a few sorting schemes but this one is currently the fastest.
    - Only a single window specification (grouping and ordering) is processed at a time. The analyzer should take care of multiple window specifications.
    - The current implementation is semi-blocking; it processes one group at a time. This means only the rows for one group per partition are kept in memory. In the future we should also accommodate the case in which all aggregates are streaming (perhaps with some buffering).
    
    Shall we try to join forces, and come up with one good PR?
    
    Kind regards,
    Herman


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95809882
  
      [Test build #30917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30917/consoleFull) for   PR 5604 at commit [`5b96e2a`](https://github.com/apache/spark/commit/5b96e2aa3e6da2a836171e4783c8199d21daed20).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression `
      * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])`
      * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])`
      * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)`
      * `case class WindowAggregate(`
      * `case class WindowAggregate(`
      * `  case class ComputedWindow(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL] Window Function Support for ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r55145289
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -195,9 +202,28 @@ case class Aggregate(
         child: LogicalPlan)
       extends UnaryNode {
     
    +  override lazy val resolved: Boolean = {
    +    val hasWindowExpressions = aggregateExpressions.exists ( _.collect {
    +        case window: WindowExpression => window
    +      }.nonEmpty
    +    )
    +
    +    !expressions.exists(!_.resolved) && childrenResolved && !hasWindowExpressions
    +  }
    +
       override def output: Seq[Attribute] = aggregateExpressions.map(_.toAttribute)
     }
     
    +case class Window(
    +    projectList: Seq[Attribute],
    --- End diff --
    
    @cloud-fan I found the reason. We still need `projectList `. When we have an expression in window specification, the `projectList ` also contains the alias names. 
    
    For example, 
    
    ```SQL 
    select key, value,
    sum(value) over (partition by key % 2 order by key) as sum
    from parquet_t1 group by key, value
    ```
    ```
    Project [key#19L,value#20,sum#18]
    +- Project [key#19L,value#20,_w0#24,_w1#25L,sum#18,sum#18]
       +- Window [key#19L,value#20,_w0#24,_w1#25L], [(sum(_w0#24),mode=Complete,isDistinct=false) windowspecdefinition(_w1#25L, key#19L ASC, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS sum#18], [_w1#25L], [key#19L ASC]
          +- Aggregate [key#19L,value#20], [key#19L,value#20,cast(value#20 as double) AS _w0#24,(key#19L % cast(2 as bigint)) AS _w1#25L]
             +- SubqueryAlias parquet_t1
                +- Relation[key#19L,value#20] ParquetRelation
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94714557
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30652/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94708397
  
      [Test build #30650 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30650/consoleFull) for   PR 5604 at commit [`138ff91`](https://github.com/apache/spark/commit/138ff91ec93163013861e8889f9944b38b0cd6a3).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression `
      * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])`
      * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])`
      * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)`
      * `case class WindowAggregate(`
      * `case class WindowAggregate(`
      * `  case class ComputedWindow(`
      * `  case class WindowFunctionInfo(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by guowei2 <gi...@git.apache.org>.
Github user guowei2 commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-97671084
  
    @scwf  I think it is a good choice, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94708428
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30650/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95774932
  
      [Test build #30900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30900/consoleFull) for   PR 5604 at commit [`b4fa747`](https://github.com/apache/spark/commit/b4fa74776f6ea767e160f81d93ae9ab49f0b88c7).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) extends UnaryExpression `
      * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])`
      * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])`
      * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)`
      * `case class WindowAggregate(`
      * `case class WindowAggregate(`
      * `  case class ComputedWindow(`
      * `  case class WindowFunctionInfo(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94651277
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95809470
  
      [Test build #30917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30917/consoleFull) for   PR 5604 at commit [`5b96e2a`](https://github.com/apache/spark/commit/5b96e2aa3e6da2a836171e4783c8199d21daed20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95868952
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30928/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95852527
  
      [Test build #30928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30928/consoleFull) for   PR 5604 at commit [`d07101b`](https://github.com/apache/spark/commit/d07101bd5a6f3b30532c4d4d77ab8d310607b684).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-95774935
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30900/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #5604: [SPARK-1442][SQL] Window Function Support for Spar...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5604#discussion_r157933488
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ---
    @@ -0,0 +1,340 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions
    +
    +import org.apache.spark.sql.catalyst.analysis.UnresolvedException
    +import org.apache.spark.sql.catalyst.errors.TreeNodeException
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types.{NumericType, DataType}
    +
    +/**
    + * The trait of the Window Specification (specified in the OVER clause or WINDOW clause) for
    + * Window Functions.
    + */
    +sealed trait WindowSpec
    +
    +/**
    + * The specification for a window function.
    + * @param partitionSpec It defines the way that input rows are partitioned.
    + * @param orderSpec It defines the ordering of rows in a partition.
    + * @param frameSpecification It defines the window frame in a partition.
    + */
    +case class WindowSpecDefinition(
    +    partitionSpec: Seq[Expression],
    +    orderSpec: Seq[SortOrder],
    +    frameSpecification: WindowFrame) extends Expression with WindowSpec {
    +
    +  def validate: Option[String] = frameSpecification match {
    +    case UnspecifiedFrame =>
    +      Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " +
    +        "during analysis. Please file a bug report.")
    +    case frame: SpecifiedWindowFrame => frame.validate.orElse {
    +      def checkValueBasedBoundaryForRangeFrame(): Option[String] = {
    +        if (orderSpec.length > 1)  {
    +          // It is not allowed to have a value-based PRECEDING and FOLLOWING
    +          // as the boundary of a Range Window Frame.
    +          Some("This Range Window Frame only accepts at most one ORDER BY expression.")
    +        } else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) {
    +          Some("The data type of the expression in the ORDER BY clause should be a numeric type.")
    +        } else {
    +          None
    +        }
    +      }
    +
    +      (frame.frameType, frame.frameStart, frame.frameEnd) match {
    +        case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame()
    +        case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame()
    +        case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame()
    +        case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame()
    +        case (_, _, _) => None
    +      }
    +    }
    +  }
    +
    +  type EvaluatedType = Any
    +
    +  override def children: Seq[Expression]  = partitionSpec ++ orderSpec
    +
    +  override lazy val resolved: Boolean =
    +    childrenResolved && frameSpecification.isInstanceOf[SpecifiedWindowFrame]
    +
    +
    +  override def toString: String = simpleString
    +
    +  override def eval(input: Row): EvaluatedType = throw new UnsupportedOperationException
    +  override def nullable: Boolean = true
    +  override def foldable: Boolean = false
    +  override def dataType: DataType = throw new UnsupportedOperationException
    +}
    +
    +/**
    + * A Window specification reference that refers to the [[WindowSpecDefinition]] defined
    + * under the name `name`.
    + */
    +case class WindowSpecReference(name: String) extends WindowSpec
    +
    +/**
    + * The trait used to represent the type of a Window Frame.
    + */
    +sealed trait FrameType
    +
    +/**
    + * RowFrame treats rows in a partition individually. When a [[ValuePreceding]]
    + * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered
    + * as a physical offset.
    + * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame,
    + * from the row precedes the current row to the row follows the current row.
    + */
    +case object RowFrame extends FrameType
    +
    +/**
    + * RangeFrame treats rows in a partition as groups of peers.
    + * All rows having the same `ORDER BY` ordering are considered as peers.
    + * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]],
    + * the value is considered as a logical offset.
    + * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`,
    + * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values
    + * `expr` are in the range of [v-1, v+1].
    + *
    + * If `ORDER BY` clause is not defined, all rows in the partition is considered as peers
    + * of the current row.
    + */
    +case object RangeFrame extends FrameType
    +
    +/**
    + * The trait used to represent the type of a Window Frame Boundary.
    + */
    +sealed trait FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean
    +}
    +
    +/** UNBOUNDED PRECEDING boundary. */
    +case object UnboundedPreceding extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => true
    +    case vp: ValuePreceding => true
    +    case CurrentRow => true
    +    case vf: ValueFollowing => true
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = "UNBOUNDED PRECEDING"
    +}
    +
    +/** <value> PRECEDING boundary. */
    +case class ValuePreceding(value: Int) extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case ValuePreceding(anotherValue) => value >= anotherValue
    +    case CurrentRow => true
    +    case vf: ValueFollowing => true
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = s"$value PRECEDING"
    +}
    +
    +/** CURRENT ROW boundary. */
    +case object CurrentRow extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case vp: ValuePreceding => false
    +    case CurrentRow => true
    +    case vf: ValueFollowing => true
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = "CURRENT ROW"
    +}
    +
    +/** <value> FOLLOWING boundary. */
    +case class ValueFollowing(value: Int) extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case vp: ValuePreceding => false
    +    case CurrentRow => false
    +    case ValueFollowing(anotherValue) => value <= anotherValue
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = s"$value FOLLOWING"
    +}
    +
    +/** UNBOUNDED FOLLOWING boundary. */
    +case object UnboundedFollowing extends FrameBoundary {
    +  def notFollows(other: FrameBoundary): Boolean = other match {
    +    case UnboundedPreceding => false
    +    case vp: ValuePreceding => false
    +    case CurrentRow => false
    +    case vf: ValueFollowing => false
    +    case UnboundedFollowing => true
    +  }
    +
    +  override def toString: String = "UNBOUNDED FOLLOWING"
    +}
    +
    +/**
    + * The trait used to represent the a Window Frame.
    + */
    +sealed trait WindowFrame
    +
    +/** Used as a place holder when a frame specification is not defined.  */
    +case object UnspecifiedFrame extends WindowFrame
    +
    +/** A specified Window Frame. */
    +case class SpecifiedWindowFrame(
    +    frameType: FrameType,
    +    frameStart: FrameBoundary,
    +    frameEnd: FrameBoundary) extends WindowFrame {
    +
    +  /** If this WindowFrame is valid or not. */
    +  def validate: Option[String] = (frameType, frameStart, frameEnd) match {
    +    case (_, UnboundedFollowing, _) =>
    +      Some(s"$UnboundedFollowing is not allowed as the start of a Window Frame.")
    +    case (_, _, UnboundedPreceding) =>
    +      Some(s"$UnboundedPreceding is not allowed as the end of a Window Frame.")
    +    // case (RowFrame, start, end) => ??? RowFrame specific rule
    +    // case (RangeFrame, start, end) => ??? RangeFrame specific rule
    +    case (_, start, end) =>
    +      if (start.notFollows(end)) {
    +        None
    +      } else {
    +        val reason =
    +          s"The end of this Window Frame $end is smaller than the start of " +
    +          s"this Window Frame $start."
    +        Some(reason)
    +      }
    +  }
    +
    +  override def toString: String = frameType match {
    +    case RowFrame => s"ROWS BETWEEN $frameStart AND $frameEnd"
    +    case RangeFrame => s"RANGE BETWEEN $frameStart AND $frameEnd"
    +  }
    +}
    +
    +object SpecifiedWindowFrame {
    +  /**
    +   *
    +   * @param hasOrderSpecification If the window spec has order by expressions.
    +   * @param acceptWindowFrame If the window function accepts user-specified frame.
    +   * @return
    +   */
    +  def defaultWindowFrame(
    +      hasOrderSpecification: Boolean,
    +      acceptWindowFrame: Boolean): SpecifiedWindowFrame = {
    +    if (hasOrderSpecification && acceptWindowFrame) {
    --- End diff --
    
    
    
    There are window functions that do not support setting a window frame (e.g. rank). So, for them, `acceptWindowFrame ` is false and the whole partition is the frame.
    
    For functions that do support setting a window frame, the default window frame is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. Please note that at here, all rows considered as the peer row of the current row are included in the frame. `ORDER BY` clause is used to determine if two row can be considered as peer rows. For example, `ORDER BY c` means that if two rows have the same value on column `c`, they are peer rows. So, without a `ORDER BY` clause, all rows are considered as the peer row of the current row, which means that the frame is effectively the entire partition.
    
    
    Related references:
    - https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql
    - https://www.postgresql.org/docs/9.3/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS. 
    (you can search `default` and find the relevant parts)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5604#issuecomment-94661415
  
      [Test build #30650 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30650/consoleFull) for   PR 5604 at commit [`138ff91`](https://github.com/apache/spark/commit/138ff91ec93163013861e8889f9944b38b0cd6a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org