You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/15 04:04:14 UTC

[GitHub] [spark] beliefer opened a new pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

beliefer opened a new pull request #35520:
URL: https://github.com/apache/spark/pull/35520


   ### What changes were proposed in this pull request?
   This PR follows up https://github.com/apache/spark/pull/35166.
   The previously referenced DB2 documentation is incorrect, resulting in the lack of compile that supports some aggregate functions.
   
   The correct documentation is https://www.ibm.com/docs/en/db2/11.5?topic=af-regression-functions-regr-avgx-regr-avgy-regr-count
   
   
   ### Why are the changes needed?
   Make build-in DB2 dialect support complete aggregate push-down more aggregate functions.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'Yes'.
   Users could use complete aggregate push-down with build-in DB2 dialect.
   
   
   ### How was this patch tested?
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1044603923


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1039831787






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer edited a comment on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
beliefer edited a comment on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1039831787


   ping @HyukjinKwon @huaxingao cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1041059080


   > The change looks good to me. Just one more thing: for the aggregate functions that support `DISTINCT`, can we test the `DISTINCT` keyword too? We didn't catch the problem last time because we didn't have tests for this.
   
   OK


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1039859546


   @beliefer could you include the db2 link in `DB2Dialect`? Also include the zOS/DB2 SQL reference link. The one I checked just now is this https://www.ibm.com/docs/en/SSEPEK_11.0.0/pdf/db2z_11_sqlrefbook.pdf


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1041050219


   The change looks good to me. Just one more thing: for the aggregate functions that support `DISTINCT`, can we test the `DISTINCT` keyword too? We didn't catch the problem last time because we didn't have tests for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1045508613


   @huaxingao Thank you help me to review. @cloud-fan Thank you help too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #35520:
URL: https://github.com/apache/spark/pull/35520#discussion_r806444721



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala
##########
@@ -37,6 +37,26 @@ private object DB2Dialect extends JdbcDialect {
           assert(f.inputs().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"VARIANCE($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "VAR_SAMP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"VARIANCE_SAMP($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "STDDEV_POP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"STDDEV($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "STDDEV_SAMP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"STDDEV_SAMP($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+          assert(f.inputs().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"COVARIANCE($distinct${f.inputs().head}, ${f.inputs().last})")

Review comment:
       <img width="397" alt="Screen Shot 2022-02-14 at 8 47 54 PM" src="https://user-images.githubusercontent.com/13592258/153994415-9778115a-dbf6-403c-8a81-487d551301f2.png">
   
   `DISTINCT` is not supported in `covariance` 

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala
##########
@@ -37,6 +37,26 @@ private object DB2Dialect extends JdbcDialect {
           assert(f.inputs().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"VARIANCE($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "VAR_SAMP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"VARIANCE_SAMP($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "STDDEV_POP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"STDDEV($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "STDDEV_SAMP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"STDDEV_SAMP($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+          assert(f.inputs().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"COVARIANCE($distinct${f.inputs().head}, ${f.inputs().last})")
+        case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" =>
+          assert(f.inputs().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"COVARIANCE_SAMP($distinct${f.inputs().head}, ${f.inputs().last})")

Review comment:
       <img width="423" alt="Screen Shot 2022-02-14 at 8 49 39 PM" src="https://user-images.githubusercontent.com/13592258/153994615-f3985ccb-4ce4-4a65-8454-a34923f46c9a.png">
   
   `DISTINCT` is not supported in `covariance_SAMP` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer edited a comment on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
beliefer edited a comment on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1039831787


   ping @HyukjinKwon @huaxingao cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1040206369


   > @beliefer could you include the db2 link in `DB2Dialect`? Also include the zOS/DB2 SQL reference link. The one I checked just now is this https://www.ibm.com/docs/en/SSEPEK_11.0.0/pdf/db2z_11_sqlrefbook.pdf
   
   OK


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #35520:
URL: https://github.com/apache/spark/pull/35520#discussion_r806444721



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala
##########
@@ -37,6 +37,26 @@ private object DB2Dialect extends JdbcDialect {
           assert(f.inputs().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"VARIANCE($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "VAR_SAMP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"VARIANCE_SAMP($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "STDDEV_POP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"STDDEV($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "STDDEV_SAMP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"STDDEV_SAMP($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+          assert(f.inputs().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"COVARIANCE($distinct${f.inputs().head}, ${f.inputs().last})")

Review comment:
       <img width="397" alt="Screen Shot 2022-02-14 at 8 47 54 PM" src="https://user-images.githubusercontent.com/13592258/153994415-9778115a-dbf6-403c-8a81-487d551301f2.png">
   
   `DISTINCT` is not supported in `covariance` 

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala
##########
@@ -37,6 +37,26 @@ private object DB2Dialect extends JdbcDialect {
           assert(f.inputs().length == 1)
           val distinct = if (f.isDistinct) "DISTINCT " else ""
           Some(s"VARIANCE($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "VAR_SAMP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"VARIANCE_SAMP($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "STDDEV_POP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"STDDEV($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "STDDEV_SAMP" =>
+          assert(f.inputs().length == 1)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"STDDEV_SAMP($distinct${f.inputs().head})")
+        case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+          assert(f.inputs().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"COVARIANCE($distinct${f.inputs().head}, ${f.inputs().last})")
+        case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" =>
+          assert(f.inputs().length == 2)
+          val distinct = if (f.isDistinct) "DISTINCT " else ""
+          Some(s"COVARIANCE_SAMP($distinct${f.inputs().head}, ${f.inputs().last})")

Review comment:
       <img width="423" alt="Screen Shot 2022-02-14 at 8 49 39 PM" src="https://user-images.githubusercontent.com/13592258/153994615-f3985ccb-4ce4-4a65-8454-a34923f46c9a.png">
   
   `DISTINCT` is not supported in `covariance_SAMP` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #35520:
URL: https://github.com/apache/spark/pull/35520


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1039831787


   ping @huaxingao cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1041086038


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1040207062


   @huaxingao Please review this again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on pull request #35520: [SPARK-37867][SQL][FOLLOWUP] Compile aggregate functions for build-in DB2 dialect

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #35520:
URL: https://github.com/apache/spark/pull/35520#issuecomment-1039859546


   @beliefer could you include the db2 link in `DB2Dialect`? Also include the zOS/DB2 SQL reference link. The one I checked just now is this https://www.ibm.com/docs/en/SSEPEK_11.0.0/pdf/db2z_11_sqlrefbook.pdf


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org