You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/07 01:14:13 UTC

[GitHub] [spark] wangyum opened a new pull request, #37813: [SPARK-40228][SQL][3.3] Do not simplify multiLike if child is not a cheap expression

wangyum opened a new pull request, #37813:
URL: https://github.com/apache/spark/pull/37813

   This PR backport https://github.com/apache/spark/pull/37672 to branch-3.3.
   
   The original PR's description:
   
   ### What changes were proposed in this pull request?
   
   Do not simplify multiLike if child is not a cheap expression.
   
   ### Why are the changes needed?
   
   1. Simplifying multiLike in this cases can not benefit the query because it cannot be pushed down.
   2. Reduce the number of evaluations for these expressions.
   
      
   For example:
   ```sql
   select * from t1 where substr(name, 1, 5) like any('%a', 'b%', '%c%');
   ```
   ```
   == Physical Plan ==
   *(1) Filter ((EndsWith(substr(name#0, 1, 5), a) OR StartsWith(substr(name#0, 1, 5), b)) OR Contains(substr(name#0, 1, 5), c))
      +- *(1) ColumnarToRow
         +- FileScan parquet default.t1[name#0] Batched: true, DataFilters: [((EndsWith(substr(name#0, 1, 5), a) OR StartsWith(substr(name#0, 1, 5), b)) OR Contains(substr(n..., Format: Parquet, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string>
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37813: [SPARK-40228][SQL][3.3] Do not simplify multiLike if child is not a cheap expression

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #37813:
URL: https://github.com/apache/spark/pull/37813#discussion_r967496771


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -1075,6 +1075,16 @@ object CollapseProject extends Rule[LogicalPlan] with AliasHelper {
     case _ => false
   }
 
+  def isCheap(e: Expression): Boolean = e match {

Review Comment:
   nit. Please copy the original function description together when we backport method.
   ```
   /**
    * Check if the given expression is cheap that we can inline it.
    */
   ``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #37813: [SPARK-40228][SQL][3.3] Do not simplify multiLike if child is not a cheap expression

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #37813:
URL: https://github.com/apache/spark/pull/37813#issuecomment-1242562694

   Merged to branch-3.3. Thank you, @wangyum and @cloud-fan .
   I added comment during backporting.
   - https://github.com/apache/spark/pull/37813#discussion_r967496771


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #37813: [SPARK-40228][SQL][3.3] Do not simplify multiLike if child is not a cheap expression

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #37813: [SPARK-40228][SQL][3.3] Do not simplify multiLike if child is not a cheap expression
URL: https://github.com/apache/spark/pull/37813


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #37813: [SPARK-40228][SQL][3.3] Do not simplify multiLike if child is not a cheap expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on PR #37813:
URL: https://github.com/apache/spark/pull/37813#issuecomment-1239059249

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37813: [SPARK-40228][SQL][3.3] Do not simplify multiLike if child is not a cheap expression

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #37813:
URL: https://github.com/apache/spark/pull/37813#discussion_r967496771


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -1075,6 +1075,16 @@ object CollapseProject extends Rule[LogicalPlan] with AliasHelper {
     case _ => false
   }
 
+  def isCheap(e: Expression): Boolean = e match {

Review Comment:
   nit. Please copy the function description together when we backport method.
   ```
   /**
     * Check if the given expression is cheap that we can inline it.
     */
   ``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org