You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2014/07/08 01:24:08 UTC

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/1325

    [SPARK-2395][SQL] Optimize common LIKE patterns.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark slowLike

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1325
    
----
commit 6d3d0a004ea7248e78307664a8e9b654a567f8f8
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-07-07T23:17:49Z

    Optimize common LIKE patterns.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48262182
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16391/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48291668
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14633294
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    --- End diff --
    
    It's not simplifying RLIKE though... it is simplifying LIKE.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48267352
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48271596
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14639873
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    --- End diff --
    
    We are conservative in that case and the optimization won't apply (but we'll still give a correct answer).  I'm happy to update the regex if you have a better one... (this one is taken from Hive... though AFICT they don't check for `\` at the end at all).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14640188
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    --- End diff --
    
    added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48282385
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16398/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48276050
  
    LGTM, +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48271598
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16394/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48291669
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16399/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48281644
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48281631
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14639691
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    --- End diff --
    
    I think this is taken care of below with the if guard.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14634396
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    +  val endsWith = "%([^_%]+)".r
    +  val contains = "%([^_%]+)%".r
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    +    case Like(l, Literal(startsWith(pattern), StringType)) => StartsWith(l, Literal(pattern))
    --- End diff --
    
    Yeah, this is a very cool optimization, and it's should support the escaping character of the user input, otherwise bug exists. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14633343
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    +  val endsWith = "%([^_%]+)".r
    +  val contains = "%([^_%]+)%".r
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    +    case Like(l, Literal(startsWith(pattern), StringType)) => StartsWith(l, Literal(pattern))
    --- End diff --
    
    I'm not sure what problem you are describing here.  This rule takes a query that is `attr LIKE 'something%' and turns it into the much faster `attr.startsWith('something')`.  Can you describe a case where that is not safe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14633804
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    +  val endsWith = "%([^_%]+)".r
    +  val contains = "%([^_%]+)%".r
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    +    case Like(l, Literal(startsWith(pattern), StringType)) => StartsWith(l, Literal(pattern))
    --- End diff --
    
    Yeah, those regexes are just finding common LIKE patterns that we can execute more efficiently.  LIKE 'a%a' for example would not match any of the cases and would be left as a LIKE.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48267346
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14633159
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    --- End diff --
    
    This probably better be called as `RLikeSimplification`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14633268
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    +  val endsWith = "%([^_%]+)".r
    +  val contains = "%([^_%]+)%".r
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    +    case Like(l, Literal(startsWith(pattern), StringType)) => StartsWith(l, Literal(pattern))
    --- End diff --
    
    Even for `RLike` simplification, people may still prefer using the `Like` function other than `RLike` for `startsWith`, `endsWith` and `contain` in string matching.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1325


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48255772
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48281746
  
    LGTM. Let's merge once Jenkins comes back happy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14633650
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    +  val endsWith = "%([^_%]+)".r
    +  val contains = "%([^_%]+)%".r
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    +    case Like(l, Literal(startsWith(pattern), StringType)) => StartsWith(l, Literal(pattern))
    --- End diff --
    
    Oh, sorry, @marmbrus , I got your means.
    I will write `xxx%` for starting with `xxx` pattern matching, and `xxx%` will be expanded as `xxx.*` instead of `([^_%]+)%` internally of the `Like`. The same for `endsWith` and `contain`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14633377
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    +  val endsWith = "%([^_%]+)".r
    +  val contains = "%([^_%]+)%".r
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    +    case Like(l, Literal(startsWith(pattern), StringType)) => StartsWith(l, Literal(pattern))
    --- End diff --
    
    I'll also note that this is not meant to be exaustive, but only cover some common cases that we can speed up 10x
    
    Hive does the same thing: https://github.com/apache/hive/blob/590c37f075def63e8507f2bfca820308b40e78b3/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterStringColLikeStringScalar.java#L64


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48255414
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48282759
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14633801
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    +  val endsWith = "%([^_%]+)".r
    +  val contains = "%([^_%]+)%".r
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    +    case Like(l, Literal(startsWith(pattern), StringType)) => StartsWith(l, Literal(pattern))
    --- End diff --
    
    Oh, sorry, @marmbrus , I got your mean. ignore my previous comment, please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48282477
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48262181
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48260743
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48254419
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48255409
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14633204
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    +  val endsWith = "%([^_%]+)".r
    +  val contains = "%([^_%]+)%".r
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    +    case Like(l, Literal(startsWith(pattern), StringType)) => StartsWith(l, Literal(pattern))
    --- End diff --
    
    `Like` is not work in this way, it only support the `\` for escape, `%` for arbitrary characters, and `_` for a single character matching.
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14639677
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    --- End diff --
    
    This actually pretty tricky because you can add escape characters to %....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48255767
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48261803
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48254410
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14640036
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    --- End diff --
    
    Actually this sounds good (it's just not a case to optimize for). Can we add comment inline to explain that? i.e. "This doesn't match case like "abcd\\%", but it doesn't affect correctness." 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48260744
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16389/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14629237
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -156,3 +156,54 @@ case class Lower(child: Expression) extends UnaryExpression with CaseConversionE
     
       override def toString() = s"Lower($child)"
     }
    +
    +/** An base class for functions that compares two strings, returning a boolean */
    --- End diff --
    
    nit: A


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48282384
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48261804
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16390/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1325#discussion_r14639717
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -112,6 +113,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
    + * For example, when the expression is just checking to see if a string starts with a given
    + * pattern.
    + */
    +object LikeSimplification extends Rule[LogicalPlan] {
    +  val startsWith = "([^_%]+)%".r
    --- End diff --
    
    What about ```
    LIKE "abcd\\%"
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2395][SQL] Optimize common LIKE pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1325#issuecomment-48282775
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---