You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/02 11:53:46 UTC

[GitHub] [spark] chaojun-zhang opened a new pull request #31710: [SPARK-34595][sql] DPP support RLIKE

chaojun-zhang opened a new pull request #31710:
URL: https://github.com/apache/spark/pull/31710


   What changes were proposed in this pull request?
   This pr make DPP support LIKE RLIKE expression:
   
   SELECT date_id, product_id FROM fact_sk f
   JOIN dim_store s
   ON f.store_id = s.store_id WHERE s.country RLIKE ANY '[DE|US]'
   Why are the changes needed?
   Improve query performance.
   
   Does this PR introduce any user-facing change?
   No.
   
   How was this patch tested?
   Unit test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] chaojun-zhang edited a comment on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
chaojun-zhang edited a comment on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-789657504


   > @chaojun-zhang - I think this PR is legit besides @maropu's comment. Wondering why close it?
   
   Sorry,  there is another  PR  I created below that is the same as this one,  actually, I should close that one, so just ignore this one. 
   
   https://github.com/apache/spark/pull/31722


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #31710:
URL: https://github.com/apache/spark/pull/31710#discussion_r585502010



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -18,7 +18,6 @@
 package org.apache.spark.sql
 
 import org.scalatest.GivenWhenThen
-

Review comment:
       please revert this.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -1403,6 +1402,52 @@ abstract class DynamicPartitionPruningSuiteBase
       )
     }
   }
+
+  test("SPARK-34436: DPP support Like/RLike expression") {
+
+

Review comment:
       please remove the two unnecessary blanks above.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -1403,6 +1402,52 @@ abstract class DynamicPartitionPruningSuiteBase
       )
     }
   }
+
+  test("SPARK-34436: DPP support Like/RLike expression") {
+
+
+    withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
+      val df = sql(
+        """
+          |SELECT date_id, product_id FROM fact_sk f
+          |JOIN dim_store s
+          |ON f.store_id = s.store_id WHERE s.country LIKE  '%D%'
+        """.stripMargin)
+
+      checkPartitionPruningPredicate(df, false, true)
+
+      checkAnswer(df,
+        Row(1030, 2) ::
+          Row(1040, 2) ::
+          Row(1050, 2) ::
+          Row(1060, 2) :: Nil
+      )
+    }
+    withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
+      val df = sql(
+        """
+          |SELECT date_id, product_id FROM fact_sk f
+          |JOIN dim_store s
+          |ON f.store_id = s.store_id WHERE s.country RLIKE  '[DE|US]'
+        """.stripMargin)
+
+      checkPartitionPruningPredicate(df, false, true)
+
+

Review comment:
       nit: please remove the single blank above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788884909


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40238/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788855203


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788883606


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135657/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788850210


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] chaojun-zhang commented on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
chaojun-zhang commented on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-789657504


   > @chaojun-zhang - I think this PR is legit besides @maropu's comment. Wondering why close it?
   
   Sorry,  there is another  PR  I created below that is the same as this one,  actually, I should close that one, so just ignore this   
   one.
   
   https://github.com/apache/spark/pull/31722


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788870647


   **[Test build #135657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135657/testReport)** for PR 31710 at commit [`f13e549`](https://github.com/apache/spark/commit/f13e5499fdc9c18c0d5a5108164178755c0301b9).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788883606


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135657/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788869605


   **[Test build #135657 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135657/testReport)** for PR 31710 at commit [`f13e549`](https://github.com/apache/spark/commit/f13e5499fdc9c18c0d5a5108164178755c0301b9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] chaojun-zhang closed pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
chaojun-zhang closed pull request #31710:
URL: https://github.com/apache/spark/pull/31710


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788884909


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40238/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788882500


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40238/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788884887


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40238/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788892941


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788869605


   **[Test build #135657 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135657/testReport)** for PR 31710 at commit [`f13e549`](https://github.com/apache/spark/commit/f13e5499fdc9c18c0d5a5108164178755c0301b9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] chaojun-zhang edited a comment on pull request #31710: [SPARK-34595][SQL] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
chaojun-zhang edited a comment on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-789657504


   > @chaojun-zhang - I think this PR is legit besides @maropu's comment. Wondering why close it?
   
   There is another  PR  I created below which is the same as this one,   I  was supposed to close that one and re-open this RP, but now just forget this RP. Apologize for any inconvenience
   
   https://github.com/apache/spark/pull/31722


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31710: [SPARK-34595][sql] DPP support RLIKE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31710:
URL: https://github.com/apache/spark/pull/31710#issuecomment-788850210


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org