You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by OopsOutOfMemory <gi...@git.apache.org> on 2014/10/29 13:09:35 UTC

[GitHub] spark pull request: [SQL] Add string operation function trim, ltri...

GitHub user OopsOutOfMemory opened a pull request:

    https://github.com/apache/spark/pull/2998

    [SQL] Add string operation function trim, ltrim, rtrim, length to support SparkSql (HiveQL)

    @marmbrus @chenghao-intel
    Add three string operation functions to support spark sql and hiveql.
    eg:
    sql("select trim('   a  b   ')  from src ").collect()    --> 'a  b'
    sql("select ltrim('   a  b   ')  from src ").collect()    --> 'a  b   ' 
    sql("select rtrim('   a  b   ')  from src ").collect()   --> '   a  b'
    sql("select length('ab')  from src ").collect()   --> 2
    
    And Rename the trait of stringOperations.scala.
    I prefer to rename trait CaseConversionExpression   to  StringTransformationExpression, it is more make sence than before so that this trait can support more string transformation but not only  caseconversion. 
    
    And also add a trait StringCalculationExpression that do string computation like  length, indexof  etc....

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/OopsOutOfMemory/spark sparksql

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2998
    
----
commit 58094965a090f9dfd9271b2183f57c1d03e31d62
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-28T08:43:50Z

    Add description of SQL CLI and HiveServer.

commit c577589e93441e2fa3577e92007b7def546c9d9e
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-28T08:48:12Z

    Merge branch 'master' of https://github.com/apache/spark

commit e654704cf13c3081bc01d2f1a00599c07c5a9a2e
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T02:06:37Z

    Merge branch 'master' of https://github.com/apache/spark

commit 3269eab23ca8adfc746ced8ddef391f51af43677
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T06:55:15Z

    Add system function trim

commit ce6899abe365ade3b776bd65e110f0a7e679df3e
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T07:31:51Z

    HiveQL support trim

commit e2781ee7ff0a32fba7854cdd618a37e20c829e7d
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T07:49:41Z

    modify keyword of Trim

commit 2166c7765abea43d86ccf3df4b891f5949998241
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T08:23:03Z

    correct spelling mistake

commit a4f4e4b5d119a4479c7594987e364de23e89ee16
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T08:27:31Z

    correct spelling mistake

commit 2137e28f5e56c7f2f203dacfd1b9eca2b8083c17
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T08:59:25Z

    add test suit for trim

commit 10d8ace1633dd52305fe0fcab9fe6fcb1ddd55d8
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T09:36:01Z

    support 3 functions : ltrim  rtrim  length in SparkQL and HiveQl

commit 0a0f4e0aa922453f1a8f7a0cbc5b3349de798075
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T09:58:39Z

    change the name of the trait CaseConversionExpression to StringTransformationExpression and add a StringCalculationExpression for string calculation in stringOperation.scala , eg: trim is for transformation and length is for calculation

commit 0fa2cd6e986bc0c56abbf6f0e1e86491db6aecf0
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T10:05:54Z

    change return type

commit b358048d6e0598238c866cd6849145da4610e2b7
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T11:47:27Z

    	deleted:    sql/README.md

commit 558d7bf37a154c17ea7d5af1daddd11255938afd
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T11:48:48Z

    	new file:   sql/README.md

commit ab29a7e892b09e01171fcfd217f2cfb80d66d651
Author: OopsOutOfMemory <vi...@126.com>
Date:   2014-10-29T11:50:39Z

    Merge branch 'sparksql' of https://github.com/OopsOutOfMemory/spark into sparksql

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19653069
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
       override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
     }
     
    +
    +/**
    + * A function that strip whitespace (or other characters) from the beginning of a string
    + */
    +case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
    +
    +    override def convert(v: String): String = v.replaceAll("^\\s+", "")
    --- End diff --
    
    I am not so sure if the regex works here, but another option is likely to be use the `org.apache.commons.lang.StringUtils`, see https://github.com/apache/hive/blob/b8250ac2f30539f6b23ce80a20a9e338d3d31458/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19654021
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
       override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
     }
     
    +
    +/**
    + * A function that strip whitespace (or other characters) from the beginning of a string
    + */
    +case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
    +
    +    override def convert(v: String): String = v.replaceAll("^\\s+", "")
    +
    +    override def toString() = s"Ltrim($child)"
    +}
    +
    +/**
    + * A function that strip whitespace (or other characters) from the end of a string
    + */
    +case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
    +
    +    override def convert(v: String): String = v.replaceAll("\\s+$", "")
    +
    +    override def toString() = s"Rtrim($child)"
    +}
    +
    +/**
    + * A function that calculate the length of a string
    + */
    +case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression {
    +
    --- End diff --
    
    so it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Add string operation function trim, ltri...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-60912669
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19652932
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
       override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
     }
     
    +
    +/**
    + * A function that strip whitespace (or other characters) from the beginning of a string
    + */
    +case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
    +
    +    override def convert(v: String): String = v.replaceAll("^\\s+", "")
    +
    +    override def toString() = s"Ltrim($child)"
    +}
    +
    +/**
    + * A function that strip whitespace (or other characters) from the end of a string
    + */
    +case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
    +
    +    override def convert(v: String): String = v.replaceAll("\\s+$", "")
    +
    +    override def toString() = s"Rtrim($child)"
    +}
    +
    +/**
    + * A function that calculate the length of a string
    + */
    +case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression {
    +
    --- End diff --
    
    You have to override the `override def dataType=IntegerType`, it's StringType by default. That's why it causes failure in  the unittest. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61591017
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22856/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by tianyi <gi...@git.apache.org>.
Github user tianyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19781402
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -500,6 +500,23 @@ class SQLQuerySuite extends QueryTest with BeforeAndAfterAll {
             (2, "abc"),
             (3, null)))
       }
    +  test("system function trim()") {
    +    checkAnswer(
    +      sql("SELECT N,TRIM(L) FROM untrimmedData"),
    +      Seq(
    +        (1, "Good"),
    +        (2, "To"),
    +        (3, "See"),
    +        (4, "You !")
    +        ))
    +
    +    checkAnswer(
    +      sql("SELECT n, TRIM(s) FROM nullStrings"),
    +        Seq(
    --- End diff --
    
    indent issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61597430
  
      [Test build #22861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22861/consoleFull) for   PR 2998 at commit [`addfbd9`](https://github.com/apache/spark/commit/addfbd97667ca3da712a32325928c23b8827a307).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait StringTransformationExpression `
      * `trait StringCalculationExpression `
      * `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
      * `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-67644023
  
    @marmbrus  
    sorry for comment it late.
    yeah, I agree with you~
    :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19653967
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -92,6 +92,29 @@ trait CaseConversionExpression {
       }
     }
     
    +
    +trait StringCalculationExpression {
    --- End diff --
    
    @chenghao-intel yep, what I mean is to add a trait to support return type of Integer. Because function like LENGTH(n)  which compute the size of a string return Integer,  INSTR(chr1,chr2,[n,[m]]) which compute a position also return Integer. But I'm not sure whether this will cause some side effect or not. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19653172
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -92,6 +92,29 @@ trait CaseConversionExpression {
       }
     }
     
    +
    +trait StringCalculationExpression {
    --- End diff --
    
    This trait seems only used by `Length`, how about just merge them? I think we can extract the trait in the future if we have more string operators returns `IntegerType`. Sorry not sure if you're planning on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61227669
  
    @chenghao-intel thanks for your review and comment :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61157614
  
      [Test build #22550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22550/consoleFull) for   PR 2998 at commit [`ab29a7e`](https://github.com/apache/spark/commit/ab29a7e892b09e01171fcfd217f2cfb80d66d651).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61222708
  
    @OopsOutOfMemory thank you working on this, it will be nice if we have those functions. I have some comments on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61162273
  
      [Test build #22550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22550/consoleFull) for   PR 2998 at commit [`ab29a7e`](https://github.com/apache/spark/commit/ab29a7e892b09e01171fcfd217f2cfb80d66d651).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait StringTransformationExpression `
      * `trait StringCalculationExpression `
      * `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
      * `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61437581
  
      [Test build #22793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22793/consoleFull) for   PR 2998 at commit [`dca6adb`](https://github.com/apache/spark/commit/dca6adb4f4e2fb8f55e9b165510acf787af0ad08).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait StringTransformationExpression `
      * `trait StringCalculationExpression `
      * `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
      * `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19653994
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
       override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
     }
     
    +
    +/**
    + * A function that strip whitespace (or other characters) from the beginning of a string
    + */
    +case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
    +
    +    override def convert(v: String): String = v.replaceAll("^\\s+", "")
    --- End diff --
    
    Good idea, it's more mature to use the commons-lang jar than the regex expression. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19716628
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -23,7 +23,8 @@ import scala.collection.IndexedSeqOptimized
     
     
     import org.apache.spark.sql.catalyst.analysis.UnresolvedException
    -import org.apache.spark.sql.catalyst.types.{BinaryType, BooleanType, DataType, StringType}
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.commons.lang.StringUtils
    --- End diff --
    
    Is there a reason to add this dependency to catalyst?  Doesn't Scala have these string functions natively?  Are these significantly faster?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61587525
  
      [Test build #22855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22855/consoleFull) for   PR 2998 at commit [`5989358`](https://github.com/apache/spark/commit/598935896e44d2ce0d0f4d68defb8ef3c7e3d805).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61591014
  
      [Test build #22856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22856/consoleFull) for   PR 2998 at commit [`0925b32`](https://github.com/apache/spark/commit/0925b323a0265bc632e334b9469ece4213ee7616).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait StringTransformationExpression `
      * `trait StringCalculationExpression `
      * `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
      * `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19717734
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -23,7 +23,8 @@ import scala.collection.IndexedSeqOptimized
     
     
     import org.apache.spark.sql.catalyst.analysis.UnresolvedException
    -import org.apache.spark.sql.catalyst.types.{BinaryType, BooleanType, DataType, StringType}
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.commons.lang.StringUtils
    --- End diff --
    
    @chenghao-intel it's a good advice, this also should look into the their implementations. I will do it later :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by tianyi <gi...@git.apache.org>.
Github user tianyi commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61582413
  
    You have added a empty file "case sensitivity" in golden files, is it related to this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2998


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19653078
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
       override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
     }
     
    +
    +/**
    + * A function that strip whitespace (or other characters) from the beginning of a string
    + */
    +case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
    +
    +    override def convert(v: String): String = v.replaceAll("^\\s+", "")
    +
    +    override def toString() = s"Ltrim($child)"
    +}
    +
    +/**
    + * A function that strip whitespace (or other characters) from the end of a string
    + */
    +case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
    +
    +    override def convert(v: String): String = v.replaceAll("\\s+$", "")
    --- End diff --
    
    Use `org.apache.commons.lang.StringUtils` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61359340
  
    @marmbrus  test passed, this can be merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61587664
  
      [Test build #22856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22856/consoleFull) for   PR 2998 at commit [`0925b32`](https://github.com/apache/spark/commit/0925b323a0265bc632e334b9469ece4213ee7616).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61162284
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22550/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61597433
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22861/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-67377479
  
    Thanks for working on this, but I'm afraid that our current approach to adding functions is becoming unsustainable. I've detailed the reasons in SPARK-4867. For this reason, I propose we close this issue for now and reopen it once that work is complete. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61590872
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22855/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19653194
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -92,6 +92,29 @@ trait CaseConversionExpression {
       }
     }
     
    +
    +trait StringCalculationExpression {
    +    self: UnaryExpression =>
    +
    +    type EvaluatedType = Any
    +
    +    def calc(v: String): Int
    +
    +    override def foldable: Boolean = child.foldable
    +    def nullable: Boolean = child.nullable
    +    def dataType: DataType = StringType
    --- End diff --
    
    `IntegerType` instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19717646
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -23,7 +23,8 @@ import scala.collection.IndexedSeqOptimized
     
     
     import org.apache.spark.sql.catalyst.analysis.UnresolvedException
    -import org.apache.spark.sql.catalyst.types.{BinaryType, BooleanType, DataType, StringType}
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.commons.lang.StringUtils
    --- End diff --
    
    It will be nicer if you can provide a micro-benchmark comparison. :), and also the regex version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61615861
  
    @tianyi  thanks for your review and comment : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61230476
  
      [Test build #22606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22606/consoleFull) for   PR 2998 at commit [`b7790f4`](https://github.com/apache/spark/commit/b7790f4c8d700d8b59b06aea80b249421c207e2c).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait StringTransformationExpression `
      * `trait StringCalculationExpression `
      * `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
      * `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61230479
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22606/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61590871
  
      [Test build #22855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22855/consoleFull) for   PR 2998 at commit [`5989358`](https://github.com/apache/spark/commit/598935896e44d2ce0d0f4d68defb8ef3c7e3d805).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait StringTransformationExpression `
      * `trait StringCalculationExpression `
      * `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
      * `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
      * `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61437582
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22793/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61156724
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61227165
  
      [Test build #22606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22606/consoleFull) for   PR 2998 at commit [`b7790f4`](https://github.com/apache/spark/commit/b7790f4c8d700d8b59b06aea80b249421c207e2c).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61435627
  
      [Test build #22793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22793/consoleFull) for   PR 2998 at commit [`dca6adb`](https://github.com/apache/spark/commit/dca6adb4f4e2fb8f55e9b165510acf787af0ad08).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19717586
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -23,7 +23,8 @@ import scala.collection.IndexedSeqOptimized
     
     
     import org.apache.spark.sql.catalyst.analysis.UnresolvedException
    -import org.apache.spark.sql.catalyst.types.{BinaryType, BooleanType, DataType, StringType}
    +import org.apache.spark.sql.catalyst.types._
    +import org.apache.commons.lang.StringUtils
    --- End diff --
    
    @marmbrus  I'm not sure which is more faster, may be could be the same. 
    But I got your idea is do not add  external dependency libs into catalyst if only they have significantly improvement.
    The native implementation can be done with ```scala.collection.immutable.StringOps.```
    eg:
    ```
    ltrim : str.dropWhile ( _ == ' ')
    rtrim : str.reverse.dropWhile(_ == ' ').reverse 
    ```
    So I will change this and retest it . Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19654215
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
       override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
     }
     
    +
    +/**
    + * A function that strip whitespace (or other characters) from the beginning of a string
    + */
    +case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
    +
    +    override def convert(v: String): String = v.replaceAll("^\\s+", "")
    +
    +    override def toString() = s"Ltrim($child)"
    +}
    +
    +/**
    + * A function that strip whitespace (or other characters) from the end of a string
    + */
    +case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
    +
    +    override def convert(v: String): String = v.replaceAll("\\s+$", "")
    --- End diff --
    
    yes, I will change the implementation to StringUtils.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by tianyi <gi...@git.apache.org>.
Github user tianyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19781336
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -49,7 +49,10 @@ trait StringRegexExpression {
         Pattern.compile(escape(str))
       }
     
    -  protected def pattern(str: String) = if(cache == null) compile(str) else cache
    +    /** Returns a string representation of the nodes in this tree */
    +    override def treeString = ???
    +
    +    protected def pattern(str: String) = if(cache == null) compile(str) else cache
    --- End diff --
    
    indent issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2998#issuecomment-61594045
  
      [Test build #22861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22861/consoleFull) for   PR 2998 at commit [`addfbd9`](https://github.com/apache/spark/commit/addfbd97667ca3da712a32325928c23b8827a307).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...

Posted by tianyi <gi...@git.apache.org>.
Github user tianyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2998#discussion_r19781741
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
    @@ -49,7 +49,10 @@ trait StringRegexExpression {
         Pattern.compile(escape(str))
       }
     
    -  protected def pattern(str: String) = if(cache == null) compile(str) else cache
    +    /** Returns a string representation of the nodes in this tree */
    +    override def treeString = ???
    --- End diff --
    
    what is it used for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org