You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/01 02:41:27 UTC

[GitHub] [spark] beliefer opened a new pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

beliefer opened a new pull request #29604:
URL: https://github.com/apache/spark/pull/29604


   ### What changes were proposed in this pull request?
   The `NTH_VALUE` function is an ANSI SQL.
   For examples:
   ```
   CREATE TEMPORARY TABLE empsalary (
       depname varchar,
       empno bigint,
       salary int,
       enroll_date date
   );
   
   INSERT INTO empsalary VALUES
   ('develop', 10, 5200, '2007-08-01'),
   ('sales', 1, 5000, '2006-10-01'),
   ('personnel', 5, 3500, '2007-12-10'),
   ('sales', 4, 4800, '2007-08-08'),
   ('personnel', 2, 3900, '2006-12-23'),
   ('develop', 7, 4200, '2008-01-01'),
   ('develop', 9, 4500, '2008-01-01'),
   ('sales', 3, 4800, '2007-08-01'),
   ('develop', 8, 6000, '2006-10-01'),
   ('develop', 11, 5200, '2007-08-15');
   
   select first_value(salary) over(order by salary range between 1000 preceding and 1000 following),
   	lead(salary) over(order by salary range between 1000 preceding and 1000 following),
   	nth_value(salary, 1) over(order by salary range between 1000 preceding and 1000 following),
   	salary from empsalary;
    first_value | lead | nth_value | salary 
   -------------+------+-----------+--------
           3500 | 3900 |      3500 |   3500
           3500 | 4200 |      3500 |   3900
           3500 | 4500 |      3500 |   4200
           3500 | 4800 |      3500 |   4500
           3900 | 4800 |      3900 |   4800
           3900 | 5000 |      3900 |   4800
           4200 | 5200 |      4200 |   5000
           4200 | 5200 |      4200 |   5200
           4200 | 6000 |      4200 |   5200
           5000 |      |      5000 |   6000
   (10 rows)
   ```
   
   There are some mainstream database support the syntax.
   
   **PostgreSQL:**
   https://www.postgresql.org/docs/8.4/functions-window.html
   
   **Vertica:**
   https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Analytic/NTH_VALUEAnalytic.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAnalytic%20Functions%7C_____23
   
   **Oracle:**
   https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0
   
   **Redshift**
   https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html
   
   **Presto**
   https://prestodb.io/docs/current/functions/window.html
   
   **MySQL**
   https://www.mysqltutorial.org/mysql-window-functions/mysql-nth_value-function/
   
   
   ### Why are the changes needed?
   The `NTH_VALUE` function is an ANSI SQL.
   The `NTH_VALUE` function is very useful.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Exists and new UT.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686195768






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694024318


   **[Test build #128805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128805/testReport)** for PR 29604 at commit [`4002aaf`](https://github.com/apache/spark/commit/4002aaf4f4558b939fd5481f12a4e29fd8e868d5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489145248



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.dataType match {
+        case IntegerType | ShortType | ByteType =>
+          offsetExpr.eval().asInstanceOf[Int] match {
+            case i: Int if i <= 0 => TypeCheckFailure(
+              s"The 'offset' argument of nth_value must be greater than zero but it is $i.")
+            case _ => TypeCheckSuccess
+          }
+        case _ => TypeCheckFailure(
+          s"The 'offset' parameter must be a int literal but it is ${offsetExpr.dataType}.")
+      }
+    }
+  }
+
+  private lazy val offset = offsetExpr.eval().asInstanceOf[Int].toLong
+  private lazy val result = AttributeReference("result", input.dataType)()
+  private lazy val count = AttributeReference("count", LongType)()
+  private lazy val valueSet = AttributeReference("valueSet", BooleanType)()
+  override lazy val aggBufferAttributes: Seq[AttributeReference] =
+    result :: count :: valueSet :: Nil
+
+  override lazy val initialValues: Seq[Literal] = Seq(
+    /* result = */ Literal.create(null, input.dataType),
+    /* count = */ Literal(1L),
+    /* valueSet = */ Literal.create(false, BooleanType)
+  )
+
+  override lazy val updateExpressions: Seq[Expression] = {
+    if (ignoreNulls) {
+      Seq(
+        /* result = */ If(valueSet || input.isNull || count < offset, result, input),
+        /* count = */ If(input.isNull, count, count + 1L),
+        /* valueSet = */ valueSet || (input.isNotNull && count >= offset)
+      )
+    } else {
+      Seq(
+        /* result = */ If(valueSet || count < offset, result, input),
+        /* count = */ count + 1L,
+        /* valueSet = */ valueSet || count >= offset

Review comment:
       `count >= offset` implies the need value has been set. But `count >= offset` no need to change the `result` with new `input`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693766703


   **[Test build #128787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128787/testReport)** for PR 29604 at commit [`4002aaf`](https://github.com/apache/spark/commit/4002aaf4f4558b939fd5481f12a4e29fd8e868d5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694696963


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128845/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694704753


   @cloud-fan @HyukjinKwon Thanks for all your review!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-691985532


   cc @Ngone51 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
juliuszsompolski commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r495598286



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,80 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offset starts at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an `offset`th row (e.g., when the offset is 10, size of the window frame
+      is less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - a positive int literal to indicate the offset in the window frame. It starts
+          with 1.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: offsetExpr :: Nil
+
+  override val frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else if (offset <= 0) {
+      TypeCheckFailure(
+        s"The 'offset' argument of nth_value must be greater than zero but it is $offset.")
+    } else {
+      TypeCheckSuccess
+    }
+  }
+
+  private lazy val offset = offsetExpr.eval().asInstanceOf[Int].toLong
+  private lazy val result = AttributeReference("result", input.dataType)()
+  private lazy val count = AttributeReference("count", LongType)()
+  override lazy val aggBufferAttributes: Seq[AttributeReference] = result :: count :: Nil
+
+  override lazy val initialValues: Seq[Literal] = Seq(
+    /* result = */ Literal.create(null, input.dataType),
+    /* count = */ Literal(1L)
+  )
+
+  override lazy val updateExpressions: Seq[Expression] = {
+    if (ignoreNulls) {
+      Seq(
+        /* result = */ If(count === offset && input.isNotNull, input, result),
+        /* count = */ If(input.isNull, count, count + 1L)
+      )
+    } else {
+      Seq(
+        /* result = */ If(count === offset, input, result),
+        /* count = */ count + 1L
+      )
+    }
+  }
+
+  override lazy val evaluateExpression: AttributeReference = result
+
+  override def toString: String = s"$prettyName($input, $offset)${if (ignoreNulls) " ignore nulls"}"

Review comment:
       This should `override def prettyName: String = "nth_value"`, to match the name of the expression.
   Should it also `override def sql`, to show the ignoreNulls parameter correctly?
   Is support of ignore nulls / respect nulls for nth_value to be done in a followup? I don't see in SqlBase.g4, like for FIRST/LAST?
   cc @beliefer @cloud-fan 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690971728






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686195768






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694612414


   **[Test build #128845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128845/testReport)** for PR 29604 at commit [`501d564`](https://github.com/apache/spark/commit/501d5645a16e5477ca5e84fb7638a912abab829b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489233337



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
##########
@@ -233,6 +233,17 @@ class AnalysisErrorSuite extends AnalysisTest {
           SpecifiedWindowFrame(RangeFrame, Literal(1), Literal(2)))).as("window")),
     "window frame" :: "must match the required frame" :: Nil)
 
+  errorTest(
+    "nth_value window function",
+    testRelation2.select(
+      WindowExpression(
+        new NthValue(AttributeReference("b", IntegerType)(), Literal(0)),
+        WindowSpecDefinition(
+          UnresolvedAttribute("a") :: Nil,
+          SortOrder(UnresolvedAttribute("b"), Ascending) :: Nil,
+          SpecifiedWindowFrame(RowFrame, Literal(0), Literal(0)))).as("window")),
+    "The 'offset' argument of nth_value must be greater than zero but it is 0." :: Nil)

Review comment:
       can we test one more case that the offset parameter is not int?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693217111


   **[Test build #128749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128749/testReport)** for PR 29604 at commit [`8778412`](https://github.com/apache/spark/commit/8778412afa9a50e9dace13f140bb7c253fc0dfc7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694202813


   **[Test build #128805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128805/testReport)** for PR 29604 at commit [`4002aaf`](https://github.com/apache/spark/commit/4002aaf4f4558b939fd5481f12a4e29fd8e868d5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693276341






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690972455






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686474175






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489187406



##########
File path: sql/core/src/test/resources/sql-tests/inputs/window.sql
##########
@@ -124,4 +144,26 @@ WINDOW w AS (PARTITION BY cate ORDER BY val);
 -- with filter predicate
 SELECT val, cate,
 count(val) FILTER (WHERE val > 1) OVER(PARTITION BY cate)
-FROM testData ORDER BY cate, val;
\ No newline at end of file
+FROM testData ORDER BY cate, val;
+
+-- nth_value() over ()
+SELECT
+    employee_name,
+    salary,
+    nth_value(employee_name, 2) OVER (ORDER BY salary DESC) second_highest_salary
+FROM
+    basic_pays
+ORDER BY salary DESC;
+
+SELECT
+	employee_name,
+	department,
+	salary,
+	NTH_VALUE(employee_name, 2) OVER  (
+		PARTITION BY department
+		ORDER BY salary DESC
+		RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693280521


   **[Test build #128760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128760/testReport)** for PR 29604 at commit [`97f6376`](https://github.com/apache/spark/commit/97f63762c466f86e85f440b99e35eacfbeaa5c53).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-691114027


   **[Test build #128563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128563/testReport)** for PR 29604 at commit [`2aee591`](https://github.com/apache/spark/commit/2aee5916770e89e7257b6c7936091746f14faace).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687367597






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694015979


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687310222


   **[Test build #128308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128308/testReport)** for PR 29604 at commit [`d95a7b7`](https://github.com/apache/spark/commit/d95a7b759cf9675d9fc97060f3873d594b1fe023).
    * This patch passes all tests.
    * This patch **does not merge cleanly**.
    * This patch adds the following public classes _(experimental)_:
     * `case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694008769






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-684161442


   **[Test build #128129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128129/testReport)** for PR 29604 at commit [`c9a96c3`](https://github.com/apache/spark/commit/c9a96c30a0dfc83d33e4a23b9236a398b2278584).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-684159399






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489121693



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693217771


   **[Test build #128749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128749/testReport)** for PR 29604 at commit [`8778412`](https://github.com/apache/spark/commit/8778412afa9a50e9dace13f140bb7c253fc0dfc7).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690972455






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r490664576



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,81 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less
+      than 10), null is returned.

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693276341


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687311186






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687232591






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-691115228






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693767074






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r490663142



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,81 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less
+      than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - a positive int literal to indicate the offset in the window frame. It starts
+          with 1.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: offsetExpr :: Nil
+
+  override val frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.eval().asInstanceOf[Int] match {
+        case i: Int if i <= 0 => TypeCheckFailure(

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693276090


   **[Test build #128756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128756/testReport)** for PR 29604 at commit [`db2b1d4`](https://github.com/apache/spark/commit/db2b1d4bc9272a31af47365f4526b14f54815b3d).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693217787






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687240165


   **[Test build #128309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128309/testReport)** for PR 29604 at commit [`addcdbc`](https://github.com/apache/spark/commit/addcdbc98b89ddaa58722be5687cd857b2a22d33).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r490218437



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,81 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less
+      than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - a positive int literal to indicate the offset in the window frame. It starts
+          with 1.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: offsetExpr :: Nil
+
+  override val frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.eval().asInstanceOf[Int] match {
+        case i: Int if i <= 0 => TypeCheckFailure(

Review comment:
       or simpler `if (offset <= 0)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690972455






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693258818






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690971728






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489229928



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,85 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less
+      than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - a positive int literal to indicate the offset in the window frame. It starts with 1.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)

Review comment:
       `ignoreNulls` is not a child (it's Boolean), so this should be `Seq(AnyDataType, IntegerType)`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #29604:
URL: https://github.com/apache/spark/pull/29604


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694697121


   Github action passed, thanks, merging to master!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693281188






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694008769


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489187346



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.dataType match {
+        case IntegerType | ShortType | ByteType =>
+          offsetExpr.eval().asInstanceOf[Int] match {
+            case i: Int if i <= 0 => TypeCheckFailure(
+              s"The 'offset' argument of nth_value must be greater than zero but it is $i.")
+            case _ => TypeCheckSuccess
+          }
+        case _ => TypeCheckFailure(
+          s"The 'offset' parameter must be a int literal but it is ${offsetExpr.dataType}.")
+      }
+    }
+  }
+
+  private lazy val offset = offsetExpr.eval().asInstanceOf[Int].toLong
+  private lazy val result = AttributeReference("result", input.dataType)()
+  private lazy val count = AttributeReference("count", LongType)()
+  private lazy val valueSet = AttributeReference("valueSet", BooleanType)()
+  override lazy val aggBufferAttributes: Seq[AttributeReference] =
+    result :: count :: valueSet :: Nil
+
+  override lazy val initialValues: Seq[Literal] = Seq(
+    /* result = */ Literal.create(null, input.dataType),
+    /* count = */ Literal(1L),
+    /* valueSet = */ Literal.create(false, BooleanType)

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r490216898



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,81 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less

Review comment:
       `offset` -> ``` `offset`th```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686319835






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687231719


   **[Test build #128308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128308/testReport)** for PR 29604 at commit [`d95a7b7`](https://github.com/apache/spark/commit/d95a7b759cf9675d9fc97060f3873d594b1fe023).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693351109


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128760/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-691985532


   cc @Ngone51 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687231719


   **[Test build #128308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128308/testReport)** for PR 29604 at commit [`d95a7b7`](https://github.com/apache/spark/commit/d95a7b759cf9675d9fc97060f3873d594b1fe023).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687367597






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694025981






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686300124


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128220/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r482664106



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##########
@@ -993,6 +993,64 @@ object functions {
     Lead(e.expr, Literal(offset), Literal(defaultValue))
   }
 
+  /**
+   * Window function: returns the value that is the `offset`th row of the window frame
+   * (counting from 1), and `null` if the size of window frame is less than `offset` rows.
+   *
+   * It will return the `offset`th non-null value it sees when ignoreNulls is set to true.
+   * If all values are null, then null is returned.
+   *
+   * This is equivalent to the nth_value function in SQL.
+   *
+   * @group window_funcs
+   * @since 3.1.0
+   */
+  def nth_value(columnName: String, offset: Int, ignoreNulls: Boolean): Column = {
+    nth_value(Column(columnName), offset, ignoreNulls)
+  }
+
+  /**
+   * Window function: returns the value that is the `offset`th row of the window frame
+   * (counting from 1), and `null` if the size of window frame is less than `offset` rows.
+   *
+   * It will return the `offset`th non-null value it sees when ignoreNulls is set to true.
+   * If all values are null, then null is returned.
+   *
+   * This is equivalent to the nth_value function in SQL.
+   *
+   * @group window_funcs
+   * @since 3.1.0
+   */
+  def nth_value(e: Column, offset: Int, ignoreNulls: Boolean): Column = withExpr {
+    NthValue(e.expr, Literal(offset), ignoreNulls)
+  }
+
+  /**
+   * Window function: returns the value that is the `offset`th row of the window frame
+   * (counting from 1), and `null` if the size of window frame is less than `offset` rows.
+   *
+   * This is equivalent to the nth_value function in SQL.
+   *
+   * @group window_funcs
+   * @since 3.1.0
+   */
+  def nth_value(columnName: String, offset: Int): Column = {

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693258818






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489252543



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
##########
@@ -233,6 +233,17 @@ class AnalysisErrorSuite extends AnalysisTest {
           SpecifiedWindowFrame(RangeFrame, Literal(1), Literal(2)))).as("window")),
     "window frame" :: "must match the required frame" :: Nil)
 
+  errorTest(
+    "nth_value window function",
+    testRelation2.select(
+      WindowExpression(
+        new NthValue(AttributeReference("b", IntegerType)(), Literal(0)),
+        WindowSpecDefinition(
+          UnresolvedAttribute("a") :: Nil,
+          SortOrder(UnresolvedAttribute("b"), Ascending) :: Nil,
+          SpecifiedWindowFrame(RowFrame, Literal(0), Literal(0)))).as("window")),
+    "The 'offset' argument of nth_value must be greater than zero but it is 0." :: Nil)

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693351102






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693351102


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489227880



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -476,7 +476,7 @@ case class Lag(input: Expression, offset: Expression, default: Expression)
 
 abstract class AggregateWindowFunction extends DeclarativeAggregate with WindowFunction {
   self: Product =>
-  override val frame = SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow)
+  override def frame: WindowFrame = SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow)

Review comment:
       is it a necessary change?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693767074






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r490216200



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,81 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less
+      than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - a positive int literal to indicate the offset in the window frame. It starts
+          with 1.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: offsetExpr :: Nil
+
+  override val frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.eval().asInstanceOf[Int] match {
+        case i: Int if i <= 0 => TypeCheckFailure(

Review comment:
       nit:
   ```
   if (offsetExpr.eval().asInstanceOf[Int] <= 0) {
     fail ...
   } else ...
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686300113


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694612817






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489187186



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.dataType match {
+        case IntegerType | ShortType | ByteType =>
+          offsetExpr.eval().asInstanceOf[Int] match {
+            case i: Int if i <= 0 => TypeCheckFailure(
+              s"The 'offset' argument of nth_value must be greater than zero but it is $i.")
+            case _ => TypeCheckSuccess
+          }
+        case _ => TypeCheckFailure(
+          s"The 'offset' parameter must be a int literal but it is ${offsetExpr.dataType}.")
+      }
+    }
+  }
+
+  private lazy val offset = offsetExpr.eval().asInstanceOf[Int].toLong
+  private lazy val result = AttributeReference("result", input.dataType)()
+  private lazy val count = AttributeReference("count", LongType)()
+  private lazy val valueSet = AttributeReference("valueSet", BooleanType)()
+  override lazy val aggBufferAttributes: Seq[AttributeReference] =
+    result :: count :: valueSet :: Nil
+
+  override lazy val initialValues: Seq[Literal] = Seq(
+    /* result = */ Literal.create(null, input.dataType),
+    /* count = */ Literal(1L),
+    /* valueSet = */ Literal.create(false, BooleanType)
+  )
+
+  override lazy val updateExpressions: Seq[Expression] = {
+    if (ignoreNulls) {
+      Seq(
+        /* result = */ If(valueSet || input.isNull || count < offset, result, input),
+        /* count = */ If(input.isNull, count, count + 1L),
+        /* valueSet = */ valueSet || (input.isNotNull && count >= offset)
+      )
+    } else {
+      Seq(
+        /* result = */ If(valueSet || count < offset, result, input),
+        /* count = */ count + 1L,
+        /* valueSet = */ valueSet || count >= offset

Review comment:
       We use
   ```
    /* result = */ If(count === offset, input, result),
   /* count = */ count + 1L
   ``` now.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.dataType match {
+        case IntegerType | ShortType | ByteType =>
+          offsetExpr.eval().asInstanceOf[Int] match {
+            case i: Int if i <= 0 => TypeCheckFailure(
+              s"The 'offset' argument of nth_value must be greater than zero but it is $i.")
+            case _ => TypeCheckSuccess
+          }
+        case _ => TypeCheckFailure(
+          s"The 'offset' parameter must be a int literal but it is ${offsetExpr.dataType}.")
+      }
+    }
+  }
+
+  private lazy val offset = offsetExpr.eval().asInstanceOf[Int].toLong
+  private lazy val result = AttributeReference("result", input.dataType)()
+  private lazy val count = AttributeReference("count", LongType)()
+  private lazy val valueSet = AttributeReference("valueSet", BooleanType)()
+  override lazy val aggBufferAttributes: Seq[AttributeReference] =
+    result :: count :: valueSet :: Nil
+
+  override lazy val initialValues: Seq[Literal] = Seq(
+    /* result = */ Literal.create(null, input.dataType),
+    /* count = */ Literal(1L),
+    /* valueSet = */ Literal.create(false, BooleanType)
+  )
+
+  override lazy val updateExpressions: Seq[Expression] = {
+    if (ignoreNulls) {
+      Seq(
+        /* result = */ If(valueSet || input.isNull || count < offset, result, input),
+        /* count = */ If(input.isNull, count, count + 1L),
+        /* valueSet = */ valueSet || (input.isNotNull && count >= offset)
+      )
+    } else {
+      Seq(
+        /* result = */ If(valueSet || count < offset, result, input),
+        /* count = */ count + 1L,
+        /* valueSet = */ valueSet || count >= offset

Review comment:
       We use
   ```
    /* result = */ If(count === offset, input, result),
   /* count = */ count + 1L
   ``` 
   now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690971728






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489231123



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,85 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less
+      than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - a positive int literal to indicate the offset in the window frame. It starts with 1.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.dataType match {

Review comment:
       This will always be `IntegerType` when we reach here, because it extends `ImplicitCastInputTypes`. We can just do `offsetExpr.eval().asInstanceOf[Int] match  ...`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686202570


   **[Test build #128220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128220/testReport)** for PR 29604 at commit [`bc0c308`](https://github.com/apache/spark/commit/bc0c308ee8fa934c2e5a588a7460f7d8045fba62).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489128367



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null

Review comment:
       If ignoreNulls=true, we will skip nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693766703


   **[Test build #128787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128787/testReport)** for PR 29604 at commit [`4002aaf`](https://github.com/apache/spark/commit/4002aaf4f4558b939fd5481f12a4e29fd8e868d5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690971728






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489243512



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -476,7 +476,7 @@ case class Lag(input: Expression, offset: Expression, default: Expression)
 
 abstract class AggregateWindowFunction extends DeclarativeAggregate with WindowFunction {
   self: Product =>
-  override val frame = SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow)
+  override def frame: WindowFrame = SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow)

Review comment:
       OK. Let's revert it.
   `NthValue` rewrite it as `override val frame: WindowFrame = UnspecifiedFrame`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489247134



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,85 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less
+      than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - a positive int literal to indicate the offset in the window frame. It starts with 1.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.dataType match {

Review comment:
       Yeah!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687241043






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-684541775


   **[Test build #128129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128129/testReport)** for PR 29604 at commit [`c9a96c3`](https://github.com/apache/spark/commit/c9a96c30a0dfc83d33e4a23b9236a398b2278584).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693280521


   **[Test build #128760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128760/testReport)** for PR 29604 at commit [`97f6376`](https://github.com/apache/spark/commit/97f63762c466f86e85f440b99e35eacfbeaa5c53).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686472510


   **[Test build #128241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128241/testReport)** for PR 29604 at commit [`bc0c308`](https://github.com/apache/spark/commit/bc0c308ee8fa934c2e5a588a7460f7d8045fba62).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686319835






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687241043






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686319182


   **[Test build #128241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128241/testReport)** for PR 29604 at commit [`bc0c308`](https://github.com/apache/spark/commit/bc0c308ee8fa934c2e5a588a7460f7d8045fba62).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694205247






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693217111


   **[Test build #128749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128749/testReport)** for PR 29604 at commit [`8778412`](https://github.com/apache/spark/commit/8778412afa9a50e9dace13f140bb7c253fc0dfc7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686315967


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694696950






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-684547547






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489124118



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687232591






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690972455






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489256112



##########
File path: sql/core/src/test/resources/sql-tests/results/postgreSQL/window_part3.sql.out
##########
@@ -385,6 +385,15 @@ org.apache.spark.sql.AnalysisException
 cannot resolve 'ntile(0)' due to data type mismatch: Buckets expression must be positive, but got: 0; line 1 pos 7
 
 
+-- !query
+SELECT nth_value(four, 0) OVER (ORDER BY ten), ten, four FROM tenk1
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+cannot resolve 'nthvalue(spark_catalog.default.tenk1.`four`)' due to data type mismatch: The 'offset' argument of nth_value must be greater than zero but it is 0.; line 1 pos 7

Review comment:
       Yes.
   ```
   SELECT nth_value(four, 0) OVER (ORDER BY ten), ten, four FROM tenk1;
   ERROR:  argument of nth_value must be greater than zero
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-684547547






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686300113






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690972455






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686319182


   **[Test build #128241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128241/testReport)** for PR 29604 at commit [`bc0c308`](https://github.com/apache/spark/commit/bc0c308ee8fa934c2e5a588a7460f7d8045fba62).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693276348


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128756/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690971728


   **[Test build #128563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128563/testReport)** for PR 29604 at commit [`2aee591`](https://github.com/apache/spark/commit/2aee5916770e89e7257b6c7936091746f14faace).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693217787


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694612817






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694024318


   **[Test build #128805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128805/testReport)** for PR 29604 at commit [`4002aaf`](https://github.com/apache/spark/commit/4002aaf4f4558b939fd5481f12a4e29fd8e868d5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694008799


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128787/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687366711


   **[Test build #128309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128309/testReport)** for PR 29604 at commit [`addcdbc`](https://github.com/apache/spark/commit/addcdbc98b89ddaa58722be5687cd857b2a22d33).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `abstract class BlockTransferService extends BlockStoreClient `
     * `case class StreamingRelationV2(`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686551122


   cc @cloud-fan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693214552






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686474175






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694612414


   **[Test build #128845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128845/testReport)** for PR 29604 at commit [`501d564`](https://github.com/apache/spark/commit/501d5645a16e5477ca5e84fb7638a912abab829b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-691115228






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693350685


   **[Test build #128760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128760/testReport)** for PR 29604 at commit [`97f6376`](https://github.com/apache/spark/commit/97f63762c466f86e85f440b99e35eacfbeaa5c53).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-684161442


   **[Test build #128129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128129/testReport)** for PR 29604 at commit [`c9a96c3`](https://github.com/apache/spark/commit/c9a96c30a0dfc83d33e4a23b9236a398b2278584).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694205247






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693258202


   **[Test build #128756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128756/testReport)** for PR 29604 at commit [`db2b1d4`](https://github.com/apache/spark/commit/db2b1d4bc9272a31af47365f4526b14f54815b3d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687240165


   **[Test build #128309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128309/testReport)** for PR 29604 at commit [`addcdbc`](https://github.com/apache/spark/commit/addcdbc98b89ddaa58722be5687cd857b2a22d33).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r488719491



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the

Review comment:
       nit: we can remove it as it just repeats the usage doc.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in

Review comment:
       `offset - a positive int literal to indicate the offset in the window frame. It starts with 1.`

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)

Review comment:
       BTW, which window frames does `OffsetWindowFunction` support?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.dataType match {
+        case IntegerType | ShortType | ByteType =>
+          offsetExpr.eval().asInstanceOf[Int] match {
+            case i: Int if i <= 0 => TypeCheckFailure(
+              s"The 'offset' argument of nth_value must be greater than zero but it is $i.")
+            case _ => TypeCheckSuccess
+          }
+        case _ => TypeCheckFailure(
+          s"The 'offset' parameter must be a int literal but it is ${offsetExpr.dataType}.")
+      }
+    }
+  }
+
+  private lazy val offset = offsetExpr.eval().asInstanceOf[Int].toLong
+  private lazy val result = AttributeReference("result", input.dataType)()
+  private lazy val count = AttributeReference("count", LongType)()
+  private lazy val valueSet = AttributeReference("valueSet", BooleanType)()
+  override lazy val aggBufferAttributes: Seq[AttributeReference] =
+    result :: count :: valueSet :: Nil
+
+  override lazy val initialValues: Seq[Literal] = Seq(
+    /* result = */ Literal.create(null, input.dataType),
+    /* count = */ Literal(1L),
+    /* valueSet = */ Literal.create(false, BooleanType)
+  )
+
+  override lazy val updateExpressions: Seq[Expression] = {
+    if (ignoreNulls) {
+      Seq(
+        /* result = */ If(valueSet || input.isNull || count < offset, result, input),
+        /* count = */ If(input.isNull, count, count + 1L),
+        /* valueSet = */ valueSet || (input.isNotNull && count >= offset)
+      )
+    } else {
+      Seq(
+        /* result = */ If(valueSet || count < offset, result, input),
+        /* count = */ count + 1L,
+        /* valueSet = */ valueSet || count >= offset

Review comment:
       and `count` should start from 0, so that we at least update `result` once.

##########
File path: sql/core/src/test/resources/sql-tests/inputs/window.sql
##########
@@ -124,4 +144,26 @@ WINDOW w AS (PARTITION BY cate ORDER BY val);
 -- with filter predicate
 SELECT val, cate,
 count(val) FILTER (WHERE val > 1) OVER(PARTITION BY cate)
-FROM testData ORDER BY cate, val;
\ No newline at end of file
+FROM testData ORDER BY cate, val;
+
+-- nth_value() over ()
+SELECT
+    employee_name,
+    salary,
+    nth_value(employee_name, 2) OVER (ORDER BY salary DESC) second_highest_salary
+FROM
+    basic_pays
+ORDER BY salary DESC;
+
+SELECT
+	employee_name,
+	department,
+	salary,
+	NTH_VALUE(employee_name, 2) OVER  (
+		PARTITION BY department
+		ORDER BY salary DESC
+		RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Review comment:
       can we test more different frame boundaries?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null

Review comment:
       This doesn't match the usage doc: ```If the value of `input` at the `offset`th row is null, null is returned```

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.dataType match {
+        case IntegerType | ShortType | ByteType =>
+          offsetExpr.eval().asInstanceOf[Int] match {
+            case i: Int if i <= 0 => TypeCheckFailure(
+              s"The 'offset' argument of nth_value must be greater than zero but it is $i.")
+            case _ => TypeCheckSuccess
+          }
+        case _ => TypeCheckFailure(
+          s"The 'offset' parameter must be a int literal but it is ${offsetExpr.dataType}.")
+      }
+    }
+  }
+
+  private lazy val offset = offsetExpr.eval().asInstanceOf[Int].toLong
+  private lazy val result = AttributeReference("result", input.dataType)()
+  private lazy val count = AttributeReference("count", LongType)()
+  private lazy val valueSet = AttributeReference("valueSet", BooleanType)()
+  override lazy val aggBufferAttributes: Seq[AttributeReference] =
+    result :: count :: valueSet :: Nil
+
+  override lazy val initialValues: Seq[Literal] = Seq(
+    /* result = */ Literal.create(null, input.dataType),
+    /* count = */ Literal(1L),
+    /* valueSet = */ Literal.create(false, BooleanType)

Review comment:
       I don't think it worths an extra boolean slot just to save the calculation of `count >= offset`.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)

Review comment:
       can we add a TODO to optimize it using `OffsetWindowFunction`?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else {
+      offsetExpr.dataType match {
+        case IntegerType | ShortType | ByteType =>
+          offsetExpr.eval().asInstanceOf[Int] match {
+            case i: Int if i <= 0 => TypeCheckFailure(
+              s"The 'offset' argument of nth_value must be greater than zero but it is $i.")
+            case _ => TypeCheckSuccess
+          }
+        case _ => TypeCheckFailure(
+          s"The 'offset' parameter must be a int literal but it is ${offsetExpr.dataType}.")
+      }
+    }
+  }
+
+  private lazy val offset = offsetExpr.eval().asInstanceOf[Int].toLong
+  private lazy val result = AttributeReference("result", input.dataType)()
+  private lazy val count = AttributeReference("count", LongType)()
+  private lazy val valueSet = AttributeReference("valueSet", BooleanType)()
+  override lazy val aggBufferAttributes: Seq[AttributeReference] =
+    result :: count :: valueSet :: Nil
+
+  override lazy val initialValues: Seq[Literal] = Seq(
+    /* result = */ Literal.create(null, input.dataType),
+    /* count = */ Literal(1L),
+    /* valueSet = */ Literal.create(false, BooleanType)
+  )
+
+  override lazy val updateExpressions: Seq[Expression] = {
+    if (ignoreNulls) {
+      Seq(
+        /* result = */ If(valueSet || input.isNull || count < offset, result, input),
+        /* count = */ If(input.isNull, count, count + 1L),
+        /* valueSet = */ valueSet || (input.isNotNull && count >= offset)
+      )
+    } else {
+      Seq(
+        /* result = */ If(valueSet || count < offset, result, input),
+        /* count = */ count + 1L,
+        /* valueSet = */ valueSet || count >= offset

Review comment:
       I'd expect something like
   ```
   Seq(
     /* result = */ If(count < offset, input, result),
     /* count = */ count + 1L
   )
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694696950


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690971728


   **[Test build #128563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128563/testReport)** for PR 29604 at commit [`2aee591`](https://github.com/apache/spark/commit/2aee5916770e89e7257b6c7936091746f14faace).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-684159399






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686299289


   **[Test build #128220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128220/testReport)** for PR 29604 at commit [`bc0c308`](https://github.com/apache/spark/commit/bc0c308ee8fa934c2e5a588a7460f7d8045fba62).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690972455






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694006701


   **[Test build #128787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128787/testReport)** for PR 29604 at commit [`4002aaf`](https://github.com/apache/spark/commit/4002aaf4f4558b939fd5481f12a4e29fd8e868d5).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694696481


   **[Test build #128845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128845/testReport)** for PR 29604 at commit [`501d564`](https://github.com/apache/spark/commit/501d5645a16e5477ca5e84fb7638a912abab829b).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-690972455






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-687311186






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693214552






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r495661956



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,80 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offset starts at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an `offset`th row (e.g., when the offset is 10, size of the window frame
+      is less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - a positive int literal to indicate the offset in the window frame. It starts
+          with 1.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: offsetExpr :: Nil
+
+  override val frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val check = super.checkInputDataTypes()
+    if (check.isFailure) {
+      check
+    } else if (!offsetExpr.foldable) {
+      TypeCheckFailure(s"Offset expression '$offsetExpr' must be a literal.")
+    } else if (offset <= 0) {
+      TypeCheckFailure(
+        s"The 'offset' argument of nth_value must be greater than zero but it is $offset.")
+    } else {
+      TypeCheckSuccess
+    }
+  }
+
+  private lazy val offset = offsetExpr.eval().asInstanceOf[Int].toLong
+  private lazy val result = AttributeReference("result", input.dataType)()
+  private lazy val count = AttributeReference("count", LongType)()
+  override lazy val aggBufferAttributes: Seq[AttributeReference] = result :: count :: Nil
+
+  override lazy val initialValues: Seq[Literal] = Seq(
+    /* result = */ Literal.create(null, input.dataType),
+    /* count = */ Literal(1L)
+  )
+
+  override lazy val updateExpressions: Seq[Expression] = {
+    if (ignoreNulls) {
+      Seq(
+        /* result = */ If(count === offset && input.isNotNull, input, result),
+        /* count = */ If(input.isNull, count, count + 1L)
+      )
+    } else {
+      Seq(
+        /* result = */ If(count === offset, input, result),
+        /* count = */ count + 1L
+      )
+    }
+  }
+
+  override lazy val evaluateExpression: AttributeReference = result
+
+  override def toString: String = s"$prettyName($input, $offset)${if (ignoreNulls) " ignore nulls"}"

Review comment:
       Thanks for your remind. I will add `prettyName` and `sql`.
   We will reactor FIRST/FIRST_VALUE in SqlBase.g4.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r481934014



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##########
@@ -993,6 +993,64 @@ object functions {
     Lead(e.expr, Literal(offset), Literal(defaultValue))
   }
 
+  /**
+   * Window function: returns the value that is the `offset`th row of the window frame
+   * (counting from 1), and `null` if the size of window frame is less than `offset` rows.
+   *
+   * It will return the `offset`th non-null value it sees when ignoreNulls is set to true.
+   * If all values are null, then null is returned.
+   *
+   * This is equivalent to the nth_value function in SQL.
+   *
+   * @group window_funcs
+   * @since 3.1.0
+   */
+  def nth_value(columnName: String, offset: Int, ignoreNulls: Boolean): Column = {
+    nth_value(Column(columnName), offset, ignoreNulls)
+  }
+
+  /**
+   * Window function: returns the value that is the `offset`th row of the window frame
+   * (counting from 1), and `null` if the size of window frame is less than `offset` rows.
+   *
+   * It will return the `offset`th non-null value it sees when ignoreNulls is set to true.
+   * If all values are null, then null is returned.
+   *
+   * This is equivalent to the nth_value function in SQL.
+   *
+   * @group window_funcs
+   * @since 3.1.0
+   */
+  def nth_value(e: Column, offset: Int, ignoreNulls: Boolean): Column = withExpr {
+    NthValue(e.expr, Literal(offset), ignoreNulls)
+  }
+
+  /**
+   * Window function: returns the value that is the `offset`th row of the window frame
+   * (counting from 1), and `null` if the size of window frame is less than `offset` rows.
+   *
+   * This is equivalent to the nth_value function in SQL.
+   *
+   * @group window_funcs
+   * @since 3.1.0
+   */
+  def nth_value(columnName: String, offset: Int): Column = {

Review comment:
       Let's remove `String` signature version and only keep the one of `Column` for now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-694025981






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693281188






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-686202570


   **[Test build #128220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128220/testReport)** for PR 29604 at commit [`bc0c308`](https://github.com/apache/spark/commit/bc0c308ee8fa934c2e5a588a7460f7d8045fba62).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r490664423



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,81 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489145718



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)

Review comment:
       `OffsetWindowFunction` not support any specified frame now.
   `OffsetWindowFunction`  limit it with
   ```
     override lazy val frame: WindowFrame = {
       val boundary = direction match {
         case Ascending => offset
         case Descending => UnaryMinus(offset) match {
             case e: Expression if e.foldable => Literal.create(e.eval(EmptyRow), e.dataType)
             case o => o
         }
       }
       SpecifiedWindowFrame(RowFrame, boundary, boundary)
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489234526



##########
File path: sql/core/src/test/resources/sql-tests/results/postgreSQL/window_part3.sql.out
##########
@@ -385,6 +385,15 @@ org.apache.spark.sql.AnalysisException
 cannot resolve 'ntile(0)' due to data type mismatch: Buckets expression must be positive, but got: 0; line 1 pos 7
 
 
+-- !query
+SELECT nth_value(four, 0) OVER (ORDER BY ten), ten, four FROM tenk1
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+cannot resolve 'nthvalue(spark_catalog.default.tenk1.`four`)' due to data type mismatch: The 'offset' argument of nth_value must be greater than zero but it is 0.; line 1 pos 7

Review comment:
       does pgsql also fail for this query?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r490217202



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,81 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less
+      than 10), null is returned.

Review comment:
       less than 10 -> is less than 10




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489123677



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,96 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row from beginning of the
+ * window frame. Offset starts at 1. When the value of `input` is null at the `offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If the value of `input` at the
+      `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the
+      offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - an int expression which determines the row number relative to the first row in
+          the window for which to return the expression. The offset can be a constant or an
+          expression and must be a positive integer that is greater than 0.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null

Review comment:
       How about `If the value of `input` at the `offset`th row is null, null is returned (respecting nulls)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29604:
URL: https://github.com/apache/spark/pull/29604#discussion_r489245255



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##########
@@ -549,6 +549,85 @@ case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction {
   override def prettyName: String = "cume_dist"
 }
 
+@ExpressionDescription(
+  usage = """
+    _FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row
+      from beginning of the window frame. Offsets start at 1. If ignoreNulls=true, we will skip
+      nulls when finding the `offset`th row. Otherwise, every row counts for the `offset`. If
+      there is no such an offset row (e.g., when the offset is 10, size of the window frame less
+      than 10), null is returned.
+  """,
+  arguments = """
+    Arguments:
+      * input - the target column or expression that the function operates on.
+      * offset - a positive int literal to indicate the offset in the window frame. It starts with 1.
+      * ignoreNulls - an optional specification that indicates the NthValue should skip null
+          values in the determination of which row to use.
+  """,
+  since = "3.1.0",
+  group = "window_funcs")
+case class NthValue(input: Expression, offsetExpr: Expression, ignoreNulls: Boolean)
+    extends AggregateWindowFunction with ImplicitCastInputTypes {
+
+  def this(child: Expression, offset: Expression) = this(child, offset, false)
+
+  override def children: Seq[Expression] = input :: Nil
+
+  override def frame: WindowFrame = UnspecifiedFrame
+
+  override def dataType: DataType = input.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, IntegerType, BooleanType)

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693217800


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128749/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29604: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29604:
URL: https://github.com/apache/spark/pull/29604#issuecomment-693258202


   **[Test build #128756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128756/testReport)** for PR 29604 at commit [`db2b1d4`](https://github.com/apache/spark/commit/db2b1d4bc9272a31af47365f4526b14f54815b3d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org