You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/29 03:47:12 UTC

[GitHub] [spark] beliefer opened a new pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

beliefer opened a new pull request #30178:
URL: https://github.com/apache/spark/pull/30178


   ### What changes were proposed in this pull request?
   https://github.com/apache/spark/pull/29800 provides a performance improvement for `NTH_VALUE`.
   `FIRST_VALUE` also could uses the `UnboundedOffsetWindowFunctionFrame` and `UnboundedPrecedingOffsetWindowFunctionFrame`.
   
   
   ### Why are the changes needed?
   Improve the performance for `FIRST_VALUE`.
   
   
   ### Does this PR introduce _any_ user-facing change?
    'No'.
   
   
   ### How was this patch tested?
   Jenkins test.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725512545






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725373573


   **[Test build #130934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130934/testReport)** for PR 30178 at commit [`fd7e02e`](https://github.com/apache/spark/commit/fd7e02eccb5c4ac6112e34a43634c8f656a447d3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521389613



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate.First
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+class OptimizeWindowFunctionsSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("OptimizeWindowFunctions", FixedPoint(10),
+        OptimizeWindowFunctions) :: Nil
+  }
+
+  test("check OptimizeWindowFunctions") {

Review comment:
       let's add a negative test: if the window frame is ordered, don't optimize.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724475678


   **[Test build #130826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130826/testReport)** for PR 30178 at commit [`2f3fbda`](https://github.com/apache/spark/commit/2f3fbda5857d861021aa272d3dec085def4f638b).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725868607






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723745492


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130762/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724581136






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718691371






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723967193


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723025136


   **[Test build #130709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130709/testReport)** for PR 30178 at commit [`879d6c7`](https://github.com/apache/spark/commit/879d6c7687e57004cc7f5925e53afb11e2064f9f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725458289






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725816099


   Merged build finished. Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718688759


   **[Test build #130397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130397/testReport)** for PR 30178 at commit [`181186c`](https://github.com/apache/spark/commit/181186c6cce9c3b4e3061dc84b667ee898dd3f40).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723736454






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725868607


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718445051


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35001/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724578203






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724581136


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723011048


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35319/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521779003



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate.First
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+class OptimizeWindowFunctionsSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("OptimizeWindowFunctions", FixedPoint(10),
+        OptimizeWindowFunctions) :: Nil
+  }
+
+  test("check OptimizeWindowFunctions") {

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718356033


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34996/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725412421


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35536/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724476717






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724420337


   **[Test build #130826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130826/testReport)** for PR 30178 at commit [`2f3fbda`](https://github.com/apache/spark/commit/2f3fbda5857d861021aa272d3dec085def4f638b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725412440






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723745486


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725841076


   **[Test build #130967 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130967/testReport)** for PR 30178 at commit [`68d3388`](https://github.com/apache/spark/commit/68d3388001615841685e9942c4220d7904f33665).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724581145


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35457/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723723310


   **[Test build #130764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130764/testReport)** for PR 30178 at commit [`7b99d27`](https://github.com/apache/spark/commit/7b99d2720b38e5f9a67a9c0810cde23b6e1ae797).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725370017


   **[Test build #130933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130933/testReport)** for PR 30178 at commit [`f851a4c`](https://github.com/apache/spark/commit/f851a4ceae8eaca7a1460f48f0266b3fd42519af).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725427516






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725902781






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718342791


   **[Test build #130393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130393/testReport)** for PR 30178 at commit [`181186c`](https://github.com/apache/spark/commit/181186c6cce9c3b4e3061dc84b667ee898dd3f40).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723739706


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35373/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723723310


   **[Test build #130764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130764/testReport)** for PR 30178 at commit [`7b99d27`](https://github.com/apache/spark/commit/7b99d2720b38e5f9a67a9c0810cde23b6e1ae797).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724560196


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35453/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725924793


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35583/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-722958337


   **[Test build #130709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130709/testReport)** for PR 30178 at commit [`879d6c7`](https://github.com/apache/spark/commit/879d6c7687e57004cc7f5925e53afb11e2064f9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725911942






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723745421


   **[Test build #130762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130762/testReport)** for PR 30178 at commit [`57c7ef1`](https://github.com/apache/spark/commit/57c7ef1fe17fe7448505f8b1976f153b275125af).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723733146


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35373/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #30178:
URL: https://github.com/apache/spark/pull/30178


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724679268


   **[Test build #130849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130849/testReport)** for PR 30178 at commit [`2f3fbda`](https://github.com/apache/spark/commit/2f3fbda5857d861021aa272d3dec085def4f638b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725816104


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35565/
   Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718408701


   **[Test build #130397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130397/testReport)** for PR 30178 at commit [`181186c`](https://github.com/apache/spark/commit/181186c6cce9c3b4e3061dc84b667ee898dd3f40).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724536515


   **[Test build #130849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130849/testReport)** for PR 30178 at commit [`2f3fbda`](https://github.com/apache/spark/commit/2f3fbda5857d861021aa272d3dec085def4f638b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725370017


   **[Test build #130933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130933/testReport)** for PR 30178 at commit [`f851a4c`](https://github.com/apache/spark/commit/f851a4ceae8eaca7a1460f48f0266b3fd42519af).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725427516






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718407326


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725960801


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35586/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723823063


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35382/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723788824


   **[Test build #130773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130773/testReport)** for PR 30178 at commit [`0c953ff`](https://github.com/apache/spark/commit/0c953ff5e8b249debcf0fe2050840afdbb176ad8).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723736454






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724581123


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35457/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725913697


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723895898


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35386/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723967214


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130778/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521828212



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate.First
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+class OptimizeWindowFunctionsSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("OptimizeWindowFunctions", FixedPoint(10),
+        OptimizeWindowFunctions) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.double, 'b.double, 'c.string)
+  val a = testRelation.output(0)
+  val b = testRelation.output(1)
+  val c = testRelation.output(2)
+
+  test("replace first(col) by nth_value(col, 1) if the window frame is ordered") {
+    val inputPlan = testRelation.select(
+      WindowExpression(
+        First(a, false).toAggregateExpression(),
+        WindowSpecDefinition(b :: Nil, c.asc :: Nil,
+          SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))))

Review comment:
       In the case of `RangeFrame`, there is no need to convert `first` to `nth_value`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521828212



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate.First
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+class OptimizeWindowFunctionsSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("OptimizeWindowFunctions", FixedPoint(10),
+        OptimizeWindowFunctions) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.double, 'b.double, 'c.string)
+  val a = testRelation.output(0)
+  val b = testRelation.output(1)
+  val c = testRelation.output(2)
+
+  test("replace first(col) by nth_value(col, 1) if the window frame is ordered") {
+    val inputPlan = testRelation.select(
+      WindowExpression(
+        First(a, false).toAggregateExpression(),
+        WindowSpecDefinition(b :: Nil, c.asc :: Nil,
+          SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))))

Review comment:
       Good question. In the case of `RangeFrame`, there is no need to convert `first` to `nth_value`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725500528


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130933/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718462684


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35001/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724536194


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724507630


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724476717


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724437876


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35435/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r526545753



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -806,6 +807,18 @@ object CollapseRepartition extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Replaces first(col) to nth_value(col, 1) for better performance.
+ */
+object OptimizeWindowFunctions extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+    case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec)
+      if spec.orderSpec.nonEmpty &&
+        spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame].frameType == RowFrame =>

Review comment:
       It is harmless to not do the `UnboundedPreceding` check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718462709






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521774289



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -806,6 +807,18 @@ object CollapseRepartition extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Substitute the aggregate expression which uses [[First]] as the aggregate function

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723840203


   **[Test build #130773 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130773/testReport)** for PR 30178 at commit [`0c953ff`](https://github.com/apache/spark/commit/0c953ff5e8b249debcf0fe2050840afdbb176ad8).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725512545


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725084965


   cc @cloud-fan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725500517


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724534425


   **[Test build #130845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130845/testReport)** for PR 30178 at commit [`2f3fbda`](https://github.com/apache/spark/commit/2f3fbda5857d861021aa272d3dec085def4f638b).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725914668


   **[Test build #130980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130980/testReport)** for PR 30178 at commit [`3a7f4e7`](https://github.com/apache/spark/commit/3a7f4e740eb5a9cecf880bc5cc294b2459e98cf1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718362027






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725816099






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718362021


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34996/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724476729


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130826/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723882781


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35386/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724563333


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35457/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r526545753



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -806,6 +807,18 @@ object CollapseRepartition extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Replaces first(col) to nth_value(col, 1) for better performance.
+ */
+object OptimizeWindowFunctions extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+    case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec)
+      if spec.orderSpec.nonEmpty &&
+        spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame].frameType == RowFrame =>

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-722958337


   **[Test build #130709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130709/testReport)** for PR 30178 at commit [`879d6c7`](https://github.com/apache/spark/commit/879d6c7687e57004cc7f5925e53afb11e2064f9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723967193






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725373573


   **[Test build #130934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130934/testReport)** for PR 30178 at commit [`fd7e02e`](https://github.com/apache/spark/commit/fd7e02eccb5c4ac6112e34a43634c8f656a447d3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723025652






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725868615


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35573/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725456994


   **[Test build #130930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130930/testReport)** for PR 30178 at commit [`e296eb6`](https://github.com/apache/spark/commit/e296eb65acb5ee30045c71e54777216cc1ebd243).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723731152


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35371/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725856249


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35573/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725902747






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725940177


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35583/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723758691


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725442665






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724508955


   **[Test build #130845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130845/testReport)** for PR 30178 at commit [`2f3fbda`](https://github.com/apache/spark/commit/2f3fbda5857d861021aa272d3dec085def4f638b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725417561


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35539/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723721434


   **[Test build #130762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130762/testReport)** for PR 30178 at commit [`57c7ef1`](https://github.com/apache/spark/commit/57c7ef1fe17fe7448505f8b1976f153b275125af).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725427494


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35539/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724536515


   **[Test build #130849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130849/testReport)** for PR 30178 at commit [`2f3fbda`](https://github.com/apache/spark/commit/2f3fbda5857d861021aa272d3dec085def4f638b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725322600


   **[Test build #130930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130930/testReport)** for PR 30178 at commit [`e296eb6`](https://github.com/apache/spark/commit/e296eb65acb5ee30045c71e54777216cc1ebd243).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725411540


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35535/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718342791


   **[Test build #130393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130393/testReport)** for PR 30178 at commit [`181186c`](https://github.com/apache/spark/commit/181186c6cce9c3b4e3061dc84b667ee898dd3f40).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521808606



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate.First
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+class OptimizeWindowFunctionsSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("OptimizeWindowFunctions", FixedPoint(10),
+        OptimizeWindowFunctions) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.double, 'b.double, 'c.string)
+  val a = testRelation.output(0)
+  val b = testRelation.output(1)
+  val c = testRelation.output(2)
+
+  test("replace first(col) by nth_value(col, 1) if the window frame is ordered") {
+    val inputPlan = testRelation.select(
+      WindowExpression(
+        First(a, false).toAggregateExpression(),
+        WindowSpecDefinition(b :: Nil, c.asc :: Nil,
+          SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))))

Review comment:
       how about `RangeFrame`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725797504


   **[Test build #130959 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130959/testReport)** for PR 30178 at commit [`72ceacc`](https://github.com/apache/spark/commit/72ceacc2677c66d884d5e4f4ac124c3f61975edd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723853552


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521890293



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate.First
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+class OptimizeWindowFunctionsSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("OptimizeWindowFunctions", FixedPoint(10),
+        OptimizeWindowFunctions) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.double, 'b.double, 'c.string)
+  val a = testRelation.output(0)
+  val b = testRelation.output(1)
+  val c = testRelation.output(2)
+
+  test("replace first(col) by nth_value(col, 1)") {
+    val inputPlan = testRelation.select(
+      WindowExpression(
+        First(a, false).toAggregateExpression(),
+        WindowSpecDefinition(b :: Nil, c.asc :: Nil,
+          SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))))
+    val correctAnswer = testRelation.select(
+      WindowExpression(
+        NthValue(a, Literal(1), false),
+        WindowSpecDefinition(b :: Nil, c.asc :: Nil,
+          SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))))
+
+    val optimized = Optimize.execute(inputPlan)
+    assert(optimized == correctAnswer)
+  }
+
+  test("can't replace first(col) by nth_value(col, 1) if the window frame type is row") {

Review comment:
       `row` -> `range`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725511488


   **[Test build #130934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130934/testReport)** for PR 30178 at commit [`fd7e02e`](https://github.com/apache/spark/commit/fd7e02eccb5c4ac6112e34a43634c8f656a447d3).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723025652


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724508955


   **[Test build #130845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130845/testReport)** for PR 30178 at commit [`2f3fbda`](https://github.com/apache/spark/commit/2f3fbda5857d861021aa272d3dec085def4f638b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723895941






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723736444


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35371/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723011063






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725911882






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725499570


   **[Test build #130933 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130933/testReport)** for PR 30178 at commit [`f851a4c`](https://github.com/apache/spark/commit/f851a4ceae8eaca7a1460f48f0266b3fd42519af).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class OptimizeWindowFunctionsSuite extends PlanTest `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725914668


   **[Test build #130980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130980/testReport)** for PR 30178 at commit [`3a7f4e7`](https://github.com/apache/spark/commit/3a7f4e740eb5a9cecf880bc5cc294b2459e98cf1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725412440






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724444130






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725940198






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725973080






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724534636


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-726130588


   thanks, merging to master!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723858208


   **[Test build #130778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130778/testReport)** for PR 30178 at commit [`0c953ff`](https://github.com/apache/spark/commit/0c953ff5e8b249debcf0fe2050840afdbb176ad8).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723758536


   **[Test build #130764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130764/testReport)** for PR 30178 at commit [`7b99d27`](https://github.com/apache/spark/commit/7b99d2720b38e5f9a67a9c0810cde23b6e1ae797).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725816087


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35565/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723000961


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35319/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725903800






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723823086






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723895941






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723745486






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723809806


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35382/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724680298






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725500517






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521390409



##########
File path: sql/core/src/test/resources/sql-tests/inputs/window.sql
##########
@@ -239,6 +264,11 @@ SELECT
 	employee_name,
 	department,
 	salary,
+	FIRST_VALUE(employee_name) OVER  (

Review comment:
       we can use the named window frame syntax to avoid duplicating the window frame definition.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723758702


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130764/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725411559






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521388978



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -806,6 +807,18 @@ object CollapseRepartition extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Substitute the aggregate expression which uses [[First]] as the aggregate function

Review comment:
       nit: `Replaces first(col) to nth_value(col, 1) for better performance.`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718408701


   **[Test build #130397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130397/testReport)** for PR 30178 at commit [`181186c`](https://github.com/apache/spark/commit/181186c6cce9c3b4e3061dc84b667ee898dd3f40).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725810045


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35565/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718407119






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718406582


   **[Test build #130393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130393/testReport)** for PR 30178 at commit [`181186c`](https://github.com/apache/spark/commit/181186c6cce9c3b4e3061dc84b667ee898dd3f40).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725411559






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724444110


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35435/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723840707






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723025665


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130709/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r525826043



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -806,6 +807,18 @@ object CollapseRepartition extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Replaces first(col) to nth_value(col, 1) for better performance.
+ */
+object OptimizeWindowFunctions extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+    case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec)
+      if spec.orderSpec.nonEmpty &&
+        spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame].frameType == RowFrame =>

Review comment:
       shall we also check if the lower bound is `UnboundedPreceding`? otherwise we can't use the offset optimization for nth_value and `first` is probably faster than `nth_value(1)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725512560


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130934/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724578203






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r526545753



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -806,6 +807,18 @@ object CollapseRepartition extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Replaces first(col) to nth_value(col, 1) for better performance.
+ */
+object OptimizeWindowFunctions extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+    case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec)
+      if spec.orderSpec.nonEmpty &&
+        spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame].frameType == RowFrame =>

Review comment:
       OK. I created the https://github.com/apache/spark/pull/30419 to make this check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-726061968






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521808339



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -806,6 +807,17 @@ object CollapseRepartition extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Replaces first(col) to nth_value(col, 1) for better performance.
+ */
+object OptimizeWindowFunctions extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+    case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec)
+      if !spec.orderSpec.isEmpty =>

Review comment:
       nonEmpty




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724444130






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723858208


   **[Test build #130778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130778/testReport)** for PR 30178 at commit [`0c953ff`](https://github.com/apache/spark/commit/0c953ff5e8b249debcf0fe2050840afdbb176ad8).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723011063






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723739717






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718362027






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725940198






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724534636






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723721434


   **[Test build #130762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130762/testReport)** for PR 30178 at commit [`57c7ef1`](https://github.com/apache/spark/commit/57c7ef1fe17fe7448505f8b1976f153b275125af).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725398569


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35536/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725401743


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35535/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725431796


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35540/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723788824


   **[Test build #130773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130773/testReport)** for PR 30178 at commit [`0c953ff`](https://github.com/apache/spark/commit/0c953ff5e8b249debcf0fe2050840afdbb176ad8).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-726474375


   @cloud-fan Thanks for your help!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723966740


   **[Test build #130778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130778/testReport)** for PR 30178 at commit [`0c953ff`](https://github.com/apache/spark/commit/0c953ff5e8b249debcf0fe2050840afdbb176ad8).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-726061968






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725868588


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35573/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724420337


   **[Test build #130826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130826/testReport)** for PR 30178 at commit [`2f3fbda`](https://github.com/apache/spark/commit/2f3fbda5857d861021aa272d3dec085def4f638b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723823086






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725442665






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725322600


   **[Test build #130930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130930/testReport)** for PR 30178 at commit [`e296eb6`](https://github.com/apache/spark/commit/e296eb65acb5ee30045c71e54777216cc1ebd243).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725973080






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724534660


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130845/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725458289






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725442641


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35540/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725797504


   **[Test build #130959 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130959/testReport)** for PR 30178 at commit [`72ceacc`](https://github.com/apache/spark/commit/72ceacc2677c66d884d5e4f4ac124c3f61975edd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723739717






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723840707


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521785541



##########
File path: sql/core/src/test/resources/sql-tests/inputs/window.sql
##########
@@ -239,6 +264,11 @@ SELECT
 	employee_name,
 	department,
 	salary,
+	FIRST_VALUE(employee_name) OVER  (

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718462709






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723758691






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725912407


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130967/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725911942






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725903800






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724578178


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35453/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725912390






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-723840714


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130773/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521823258



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -806,6 +807,17 @@ object CollapseRepartition extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Replaces first(col) to nth_value(col, 1) for better performance.
+ */
+object OptimizeWindowFunctions extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+    case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec)
+      if !spec.orderSpec.isEmpty =>

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718407119






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718691371






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-726059283


   **[Test build #130980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130980/testReport)** for PR 30178 at commit [`3a7f4e7`](https://github.com/apache/spark/commit/3a7f4e740eb5a9cecf880bc5cc294b2459e98cf1).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #30178:
URL: https://github.com/apache/spark/pull/30178#discussion_r521894055



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate.First
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+class OptimizeWindowFunctionsSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("OptimizeWindowFunctions", FixedPoint(10),
+        OptimizeWindowFunctions) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.double, 'b.double, 'c.string)
+  val a = testRelation.output(0)
+  val b = testRelation.output(1)
+  val c = testRelation.output(2)
+
+  test("replace first(col) by nth_value(col, 1)") {
+    val inputPlan = testRelation.select(
+      WindowExpression(
+        First(a, false).toAggregateExpression(),
+        WindowSpecDefinition(b :: Nil, c.asc :: Nil,
+          SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))))
+    val correctAnswer = testRelation.select(
+      WindowExpression(
+        NthValue(a, Literal(1), false),
+        WindowSpecDefinition(b :: Nil, c.asc :: Nil,
+          SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))))
+
+    val optimized = Optimize.execute(inputPlan)
+    assert(optimized == correctAnswer)
+  }
+
+  test("can't replace first(col) by nth_value(col, 1) if the window frame type is row") {

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-724680298






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-725973055


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35586/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org