You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by OopsOutOfMemory <gi...@git.apache.org> on 2014/10/29 13:09:35 UTC
[GitHub] spark pull request: [SQL] Add string operation function trim, ltri...
GitHub user OopsOutOfMemory opened a pull request:
https://github.com/apache/spark/pull/2998
[SQL] Add string operation function trim, ltrim, rtrim, length to support SparkSql (HiveQL)
@marmbrus @chenghao-intel
Add three string operation functions to support spark sql and hiveql.
eg:
sql("select trim(' a b ') from src ").collect() --> 'a b'
sql("select ltrim(' a b ') from src ").collect() --> 'a b '
sql("select rtrim(' a b ') from src ").collect() --> ' a b'
sql("select length('ab') from src ").collect() --> 2
And Rename the trait of stringOperations.scala.
I prefer to rename trait CaseConversionExpression to StringTransformationExpression, it is more make sence than before so that this trait can support more string transformation but not only caseconversion.
And also add a trait StringCalculationExpression that do string computation like length, indexof etc....
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/OopsOutOfMemory/spark sparksql
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2998.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2998
----
commit 58094965a090f9dfd9271b2183f57c1d03e31d62
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-28T08:43:50Z
Add description of SQL CLI and HiveServer.
commit c577589e93441e2fa3577e92007b7def546c9d9e
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-28T08:48:12Z
Merge branch 'master' of https://github.com/apache/spark
commit e654704cf13c3081bc01d2f1a00599c07c5a9a2e
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T02:06:37Z
Merge branch 'master' of https://github.com/apache/spark
commit 3269eab23ca8adfc746ced8ddef391f51af43677
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T06:55:15Z
Add system function trim
commit ce6899abe365ade3b776bd65e110f0a7e679df3e
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T07:31:51Z
HiveQL support trim
commit e2781ee7ff0a32fba7854cdd618a37e20c829e7d
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T07:49:41Z
modify keyword of Trim
commit 2166c7765abea43d86ccf3df4b891f5949998241
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T08:23:03Z
correct spelling mistake
commit a4f4e4b5d119a4479c7594987e364de23e89ee16
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T08:27:31Z
correct spelling mistake
commit 2137e28f5e56c7f2f203dacfd1b9eca2b8083c17
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T08:59:25Z
add test suit for trim
commit 10d8ace1633dd52305fe0fcab9fe6fcb1ddd55d8
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T09:36:01Z
support 3 functions : ltrim rtrim length in SparkQL and HiveQl
commit 0a0f4e0aa922453f1a8f7a0cbc5b3349de798075
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T09:58:39Z
change the name of the trait CaseConversionExpression to StringTransformationExpression and add a StringCalculationExpression for string calculation in stringOperation.scala , eg: trim is for transformation and length is for calculation
commit 0fa2cd6e986bc0c56abbf6f0e1e86491db6aecf0
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T10:05:54Z
change return type
commit b358048d6e0598238c866cd6849145da4610e2b7
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T11:47:27Z
deleted: sql/README.md
commit 558d7bf37a154c17ea7d5af1daddd11255938afd
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T11:48:48Z
new file: sql/README.md
commit ab29a7e892b09e01171fcfd217f2cfb80d66d651
Author: OopsOutOfMemory <vi...@126.com>
Date: 2014-10-29T11:50:39Z
Merge branch 'sparksql' of https://github.com/OopsOutOfMemory/spark into sparksql
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19653069
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
}
+
+/**
+ * A function that strip whitespace (or other characters) from the beginning of a string
+ */
+case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
+
+ override def convert(v: String): String = v.replaceAll("^\\s+", "")
--- End diff --
I am not so sure if the regex works here, but another option is likely to be use the `org.apache.commons.lang.StringUtils`, see https://github.com/apache/hive/blob/b8250ac2f30539f6b23ce80a20a9e338d3d31458/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19654021
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
}
+
+/**
+ * A function that strip whitespace (or other characters) from the beginning of a string
+ */
+case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
+
+ override def convert(v: String): String = v.replaceAll("^\\s+", "")
+
+ override def toString() = s"Ltrim($child)"
+}
+
+/**
+ * A function that strip whitespace (or other characters) from the end of a string
+ */
+case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
+
+ override def convert(v: String): String = v.replaceAll("\\s+$", "")
+
+ override def toString() = s"Rtrim($child)"
+}
+
+/**
+ * A function that calculate the length of a string
+ */
+case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression {
+
--- End diff --
so it is.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SQL] Add string operation function trim, ltri...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-60912669
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19652932
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
}
+
+/**
+ * A function that strip whitespace (or other characters) from the beginning of a string
+ */
+case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
+
+ override def convert(v: String): String = v.replaceAll("^\\s+", "")
+
+ override def toString() = s"Ltrim($child)"
+}
+
+/**
+ * A function that strip whitespace (or other characters) from the end of a string
+ */
+case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
+
+ override def convert(v: String): String = v.replaceAll("\\s+$", "")
+
+ override def toString() = s"Rtrim($child)"
+}
+
+/**
+ * A function that calculate the length of a string
+ */
+case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression {
+
--- End diff --
You have to override the `override def dataType=IntegerType`, it's StringType by default. That's why it causes failure in the unittest.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61591017
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22856/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by tianyi <gi...@git.apache.org>.
Github user tianyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19781402
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -500,6 +500,23 @@ class SQLQuerySuite extends QueryTest with BeforeAndAfterAll {
(2, "abc"),
(3, null)))
}
+ test("system function trim()") {
+ checkAnswer(
+ sql("SELECT N,TRIM(L) FROM untrimmedData"),
+ Seq(
+ (1, "Good"),
+ (2, "To"),
+ (3, "See"),
+ (4, "You !")
+ ))
+
+ checkAnswer(
+ sql("SELECT n, TRIM(s) FROM nullStrings"),
+ Seq(
--- End diff --
indent issue
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61597430
[Test build #22861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22861/consoleFull) for PR 2998 at commit [`addfbd9`](https://github.com/apache/spark/commit/addfbd97667ca3da712a32325928c23b8827a307).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `trait StringTransformationExpression `
* `trait StringCalculationExpression `
* `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
* `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-67644023
@marmbrus
sorry for comment it late.
yeah, I agree with you~
:)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19653967
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -92,6 +92,29 @@ trait CaseConversionExpression {
}
}
+
+trait StringCalculationExpression {
--- End diff --
@chenghao-intel yep, what I mean is to add a trait to support return type of Integer. Because function like LENGTH(n) which compute the size of a string return Integer, INSTR(chr1,chr2,[n,[m]]) which compute a position also return Integer. But I'm not sure whether this will cause some side effect or not.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19653172
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -92,6 +92,29 @@ trait CaseConversionExpression {
}
}
+
+trait StringCalculationExpression {
--- End diff --
This trait seems only used by `Length`, how about just merge them? I think we can extract the trait in the future if we have more string operators returns `IntegerType`. Sorry not sure if you're planning on this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61227669
@chenghao-intel thanks for your review and comment :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61157614
[Test build #22550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22550/consoleFull) for PR 2998 at commit [`ab29a7e`](https://github.com/apache/spark/commit/ab29a7e892b09e01171fcfd217f2cfb80d66d651).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61222708
@OopsOutOfMemory thank you working on this, it will be nice if we have those functions. I have some comments on it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61162273
[Test build #22550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22550/consoleFull) for PR 2998 at commit [`ab29a7e`](https://github.com/apache/spark/commit/ab29a7e892b09e01171fcfd217f2cfb80d66d651).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `trait StringTransformationExpression `
* `trait StringCalculationExpression `
* `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
* `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61437581
[Test build #22793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22793/consoleFull) for PR 2998 at commit [`dca6adb`](https://github.com/apache/spark/commit/dca6adb4f4e2fb8f55e9b165510acf787af0ad08).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `trait StringTransformationExpression `
* `trait StringCalculationExpression `
* `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
* `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19653994
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
}
+
+/**
+ * A function that strip whitespace (or other characters) from the beginning of a string
+ */
+case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
+
+ override def convert(v: String): String = v.replaceAll("^\\s+", "")
--- End diff --
Good idea, it's more mature to use the commons-lang jar than the regex expression.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19716628
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -23,7 +23,8 @@ import scala.collection.IndexedSeqOptimized
import org.apache.spark.sql.catalyst.analysis.UnresolvedException
-import org.apache.spark.sql.catalyst.types.{BinaryType, BooleanType, DataType, StringType}
+import org.apache.spark.sql.catalyst.types._
+import org.apache.commons.lang.StringUtils
--- End diff --
Is there a reason to add this dependency to catalyst? Doesn't Scala have these string functions natively? Are these significantly faster?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61587525
[Test build #22855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22855/consoleFull) for PR 2998 at commit [`5989358`](https://github.com/apache/spark/commit/598935896e44d2ce0d0f4d68defb8ef3c7e3d805).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61591014
[Test build #22856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22856/consoleFull) for PR 2998 at commit [`0925b32`](https://github.com/apache/spark/commit/0925b323a0265bc632e334b9469ece4213ee7616).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `trait StringTransformationExpression `
* `trait StringCalculationExpression `
* `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
* `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19717734
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -23,7 +23,8 @@ import scala.collection.IndexedSeqOptimized
import org.apache.spark.sql.catalyst.analysis.UnresolvedException
-import org.apache.spark.sql.catalyst.types.{BinaryType, BooleanType, DataType, StringType}
+import org.apache.spark.sql.catalyst.types._
+import org.apache.commons.lang.StringUtils
--- End diff --
@chenghao-intel it's a good advice, this also should look into the their implementations. I will do it later :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by tianyi <gi...@git.apache.org>.
Github user tianyi commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61582413
You have added a empty file "case sensitivity" in golden files, is it related to this PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/2998
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19653078
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
}
+
+/**
+ * A function that strip whitespace (or other characters) from the beginning of a string
+ */
+case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
+
+ override def convert(v: String): String = v.replaceAll("^\\s+", "")
+
+ override def toString() = s"Ltrim($child)"
+}
+
+/**
+ * A function that strip whitespace (or other characters) from the end of a string
+ */
+case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
+
+ override def convert(v: String): String = v.replaceAll("\\s+$", "")
--- End diff --
Use `org.apache.commons.lang.StringUtils` instead?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61359340
@marmbrus test passed, this can be merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61587664
[Test build #22856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22856/consoleFull) for PR 2998 at commit [`0925b32`](https://github.com/apache/spark/commit/0925b323a0265bc632e334b9469ece4213ee7616).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61162284
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22550/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61597433
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22861/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-67377479
Thanks for working on this, but I'm afraid that our current approach to adding functions is becoming unsustainable. I've detailed the reasons in SPARK-4867. For this reason, I propose we close this issue for now and reopen it once that work is complete. What do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61590872
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22855/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19653194
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -92,6 +92,29 @@ trait CaseConversionExpression {
}
}
+
+trait StringCalculationExpression {
+ self: UnaryExpression =>
+
+ type EvaluatedType = Any
+
+ def calc(v: String): Int
+
+ override def foldable: Boolean = child.foldable
+ def nullable: Boolean = child.nullable
+ def dataType: DataType = StringType
--- End diff --
`IntegerType` instead.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19717646
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -23,7 +23,8 @@ import scala.collection.IndexedSeqOptimized
import org.apache.spark.sql.catalyst.analysis.UnresolvedException
-import org.apache.spark.sql.catalyst.types.{BinaryType, BooleanType, DataType, StringType}
+import org.apache.spark.sql.catalyst.types._
+import org.apache.commons.lang.StringUtils
--- End diff --
It will be nicer if you can provide a micro-benchmark comparison. :), and also the regex version.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61615861
@tianyi thanks for your review and comment : )
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61230476
[Test build #22606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22606/consoleFull) for PR 2998 at commit [`b7790f4`](https://github.com/apache/spark/commit/b7790f4c8d700d8b59b06aea80b249421c207e2c).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `trait StringTransformationExpression `
* `trait StringCalculationExpression `
* `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
* `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61230479
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22606/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61590871
[Test build #22855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22855/consoleFull) for PR 2998 at commit [`5989358`](https://github.com/apache/spark/commit/598935896e44d2ce0d0f4d68defb8ef3c7e3d805).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `trait StringTransformationExpression `
* `trait StringCalculationExpression `
* `case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Length(child: Expression) extends UnaryExpression with StringCalculationExpression `
* `case class Trim(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Upper(child: Expression) extends UnaryExpression with StringTransformationExpression `
* `case class Lower(child: Expression) extends UnaryExpression with StringTransformationExpression `
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61437582
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22793/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61156724
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61227165
[Test build #22606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22606/consoleFull) for PR 2998 at commit [`b7790f4`](https://github.com/apache/spark/commit/b7790f4c8d700d8b59b06aea80b249421c207e2c).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61435627
[Test build #22793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22793/consoleFull) for PR 2998 at commit [`dca6adb`](https://github.com/apache/spark/commit/dca6adb4f4e2fb8f55e9b165510acf787af0ad08).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19717586
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -23,7 +23,8 @@ import scala.collection.IndexedSeqOptimized
import org.apache.spark.sql.catalyst.analysis.UnresolvedException
-import org.apache.spark.sql.catalyst.types.{BinaryType, BooleanType, DataType, StringType}
+import org.apache.spark.sql.catalyst.types._
+import org.apache.commons.lang.StringUtils
--- End diff --
@marmbrus I'm not sure which is more faster, may be could be the same.
But I got your idea is do not add external dependency libs into catalyst if only they have significantly improvement.
The native implementation can be done with ```scala.collection.immutable.StringOps.```
eg:
```
ltrim : str.dropWhile ( _ == ' ')
rtrim : str.reverse.dropWhile(_ == ' ').reverse
```
So I will change this and retest it . Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by OopsOutOfMemory <gi...@git.apache.org>.
Github user OopsOutOfMemory commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19654215
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -134,10 +157,51 @@ case class RLike(left: Expression, right: Expression)
override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).find(0)
}
+
+/**
+ * A function that strip whitespace (or other characters) from the beginning of a string
+ */
+case class Ltrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
+
+ override def convert(v: String): String = v.replaceAll("^\\s+", "")
+
+ override def toString() = s"Ltrim($child)"
+}
+
+/**
+ * A function that strip whitespace (or other characters) from the end of a string
+ */
+case class Rtrim(child: Expression) extends UnaryExpression with StringTransformationExpression {
+
+ override def convert(v: String): String = v.replaceAll("\\s+$", "")
--- End diff --
yes, I will change the implementation to StringUtils.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by tianyi <gi...@git.apache.org>.
Github user tianyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19781336
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -49,7 +49,10 @@ trait StringRegexExpression {
Pattern.compile(escape(str))
}
- protected def pattern(str: String) = if(cache == null) compile(str) else cache
+ /** Returns a string representation of the nodes in this tree */
+ override def treeString = ???
+
+ protected def pattern(str: String) = if(cache == null) compile(str) else cache
--- End diff --
indent issue
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2998#issuecomment-61594045
[Test build #22861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22861/consoleFull) for PR 2998 at commit [`addfbd9`](https://github.com/apache/spark/commit/addfbd97667ca3da712a32325928c23b8827a307).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4151][SQL] Add string operation functio...
Posted by tianyi <gi...@git.apache.org>.
Github user tianyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/2998#discussion_r19781741
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala ---
@@ -49,7 +49,10 @@ trait StringRegexExpression {
Pattern.compile(escape(str))
}
- protected def pattern(str: String) = if(cache == null) compile(str) else cache
+ /** Returns a string representation of the nodes in this tree */
+ override def treeString = ???
--- End diff --
what is it used for?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org