You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/10 10:26:01 UTC
[GitHub] [spark] beliefer opened a new pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
beliefer opened a new pull request #29999:
URL: https://github.com/apache/spark/pull/29999
### What changes were proposed in this pull request?
Spark already support `LIKE ALL` syntax, but it will throw `StackOverflowError` if there are many elements(more than 14378 elements). We should implement built-in function for LIKE ALL to fix this issue.
### Why are the changes needed?
1.Fix the `StackOverflowError` issue.
2.Support built-in function `like_all`.
### Does this PR introduce _any_ user-facing change?
'No'.
### How was this patch tested?
Jenkins test.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-707634498
@maropu We can reproduce the `java.lang.StackOverflowError` in this way:
```scala
spark.sql("create table SPARK_33045(id string) using parquet")
val values = Range(1, 10000)
spark.sql(s"select * from SPARK_33045 where id like all (${values.mkString(", ")})").show
```
This is because we rewrite like all/any to like:
```scala
spark.sql(s"select * from SPARK_33045 where ${values.map(i => s"id like $i").mkString(" and ")}").show
```
And `In` predicate will not throw `java.lang.StackOverflowError` :
```scala
spark.sql(s"select * from SPARK_33045 where id in (${values.mkString(", ")})").show
```
So I think we can implement built-in LIKE ANY and LIKE ALL UDF similar to `In` predicate to fix this issue.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r505197069
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,125 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ StringType +: Seq.fill(children.size - 1)(StringType)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
+
+ override def eval(input: InternalRow): Any = {
+ val evaluatedValue = value.eval(input)
+ if (evaluatedValue == null) {
+ null
+ } else {
+ var hasNull = false
+ var match = true
+ list.foreach { e =>
+ val str = e.eval(input)
+ if (str == null) {
+ hasNull = true
+ } else {
+ val regex =
+ Pattern.compile(StringUtils.escapeLikeRegex(str.asInstanceOf[UTF8String].toString, '\\'))
+ if ((isNot && matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) ||
+ !(isNot || matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) {
Review comment:
can we put `matches` as a local variable to shorten the code here?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r505199510
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,125 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ StringType +: Seq.fill(children.size - 1)(StringType)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
+
+ override def eval(input: InternalRow): Any = {
+ val evaluatedValue = value.eval(input)
+ if (evaluatedValue == null) {
+ null
+ } else {
+ var hasNull = false
+ var match = true
+ list.foreach { e =>
+ val str = e.eval(input)
+ if (str == null) {
+ hasNull = true
+ } else {
+ val regex =
+ Pattern.compile(StringUtils.escapeLikeRegex(str.asInstanceOf[UTF8String].toString, '\\'))
+ if ((isNot && matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) ||
+ !(isNot || matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) {
+ match = false
+ }
+ }
+ }
+ if (hasNull) {
+ null
+ } else {
+ match
+ }
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val patternClass = classOf[Pattern].getName
+ val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + ".escapeLikeRegex"
+ val javaDataType = CodeGenerator.javaType(value.dataType)
+ val valueGen = value.genCode(ctx)
+ val listGen = list.map(_.genCode(ctx))
+ val pattern = ctx.freshName("pattern")
+ val rightStr = ctx.freshName("rightStr")
+ val escapedEscapeChar = StringEscapeUtils.escapeJava("\\")
+ val hasNull = ctx.freshName("hasNull")
+ val matched = ctx.freshName("matched")
+ val valueArg = ctx.freshName("valueArg")
+ val listCode = listGen.map(x =>
+ s"""
+ |${x.code}
+ |if (${x.isNull}) {
+ | $hasNull = true; // ${ev.isNull} = true;
+ |} else if (!$hasNull && $matched) {
+ | String $rightStr = ${x.value}.toString();
+ | $patternClass $pattern =
+ | $patternClass.compile($escapeFunc($rightStr, '$escapedEscapeChar'));
Review comment:
this might cause perf regression.
In `LIke` expression, we build the regex only once if the regex string is foldable.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725256865
**[Test build #130900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130900/testReport)** for PR 29999 at commit [`15bac5b`](https://github.com/apache/spark/commit/15bac5bfecb209ba7b6963d83423b659fbc5086d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-712150902
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730203655
**[Test build #131331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131331/testReport)** for PR 29999 at commit [`001eb38`](https://github.com/apache/spark/commit/001eb38f603267c6a6f4e1c25430b8900644f5b7).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706527357
**[Test build #129623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129623/testReport)** for PR 29999 at commit [`4163382`](https://github.com/apache/spark/commit/41633827583d6f0d91e0e48b781c25c95ec06765).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708916521
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129802/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706863638
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r524852125
##########
File path: sql/core/src/test/resources/sql-functions/sql-expression-schema.md
##########
@@ -346,4 +346,4 @@
| org.apache.spark.sql.catalyst.expressions.xml.XPathList | xpath | SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()') | struct<xpath(<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>, a/b/text()):array<string>> |
| org.apache.spark.sql.catalyst.expressions.xml.XPathLong | xpath_long | SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)') | struct<xpath_long(<a><b>1</b><b>2</b></a>, sum(a/b)):bigint> |
| org.apache.spark.sql.catalyst.expressions.xml.XPathShort | xpath_short | SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)') | struct<xpath_short(<a><b>1</b><b>2</b></a>, sum(a/b)):smallint> |
-| org.apache.spark.sql.catalyst.expressions.xml.XPathString | xpath_string | SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') | struct<xpath_string(<a><b>b</b><c>cc</c></a>, a/c):string> |
\ No newline at end of file
+| org.apache.spark.sql.catalyst.expressions.xml.XPathString | xpath_string | SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') | struct<xpath_string(<a><b>b</b><c>cc</c></a>, a/c):string> |
Review comment:
I reverted it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706841427
**[Test build #129659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129659/testReport)** for PR 29999 at commit [`a7cd416`](https://github.com/apache/spark/commit/a7cd416f40308cfc841fb0c7210728e69ba4ac1e).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r521160412
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -1408,7 +1408,20 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
case Some(SqlBaseParser.ANY) | Some(SqlBaseParser.SOME) =>
getLikeQuantifierExprs(ctx.expression).reduceLeft(Or)
case Some(SqlBaseParser.ALL) =>
- getLikeQuantifierExprs(ctx.expression).reduceLeft(And)
+ validate(!ctx.expression.isEmpty, "Expected something between '(' and ')'.", ctx)
+ val expressions = ctx.expression.asScala.map(expression)
+ if (expressions.size > 200 && expressions.forall(_.foldable)) {
Review comment:
OK
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-726785967
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709702627
**[Test build #129872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129872/testReport)** for PR 29999 at commit [`be5eb8a`](https://github.com/apache/spark/commit/be5eb8a1f092e15c941d39d517284aed67de72c9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709988169
**[Test build #129894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129894/testReport)** for PR 29999 at commit [`f657ff0`](https://github.com/apache/spark/commit/f657ff0372f1cac48ea008a08c1cc7011f934d98).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706919391
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708868898
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34387/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-715204127
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34798/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706836333
I have the same impression like @maropu 's first comment. Could you answer his question, @beliefer ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706919945
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714227951
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34730/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708878815
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34389/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-711955492
**[Test build #129999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129999/testReport)** for PR 29999 at commit [`ad4d2d9`](https://github.com/apache/spark/commit/ad4d2d9cde81beff27c9eaadae77a132d59599cc).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r522752648
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,90 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[Any]
Review comment:
OK
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728745887
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714215550
**[Test build #130125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130125/testReport)** for PR 29999 at commit [`55465b8`](https://github.com/apache/spark/commit/55465b8fcd5dbde93c23eae99d94fb877e9cb5f3).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714487188
**[Test build #130139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130139/testReport)** for PR 29999 at commit [`55465b8`](https://github.com/apache/spark/commit/55465b8fcd5dbde93c23eae99d94fb877e9cb5f3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `abstract class LikeAllBase extends Expression with ImplicitCastInputTypes `
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708930779
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708867172
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34385/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709984475
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728671247
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35793/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730201079
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708935701
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34411/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724815285
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709984040
**[Test build #129884 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129884/testReport)** for PR 29999 at commit [`f657ff0`](https://github.com/apache/spark/commit/f657ff0372f1cac48ea008a08c1cc7011f934d98).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728911196
**[Test build #131211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131211/testReport)** for PR 29999 at commit [`f0e3de1`](https://github.com/apache/spark/commit/f0e3de1718e99c887833f230c77c17c3851f9fc7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706869300
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34265/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728768216
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708692143
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34363/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725271482
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-712227544
**[Test build #129999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129999/testReport)** for PR 29999 at commit [`ad4d2d9`](https://github.com/apache/spark/commit/ad4d2d9cde81beff27c9eaadae77a132d59599cc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730203655
**[Test build #131331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131331/testReport)** for PR 29999 at commit [`001eb38`](https://github.com/apache/spark/commit/001eb38f603267c6a6f4e1c25430b8900644f5b7).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r522786176
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -1408,7 +1408,20 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
case Some(SqlBaseParser.ANY) | Some(SqlBaseParser.SOME) =>
getLikeQuantifierExprs(ctx.expression).reduceLeft(Or)
case Some(SqlBaseParser.ALL) =>
- getLikeQuantifierExprs(ctx.expression).reduceLeft(And)
+ validate(!ctx.expression.isEmpty, "Expected something between '(' and ')'.", ctx)
+ val expressions = ctx.expression.asScala.map(expression)
+ if (expressions.size > SQLConf.get.optimizerLikeAllConversionThreshold &&
+ expressions.forall(_.foldable)) {
Review comment:
Yes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728769754
**[Test build #131211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131211/testReport)** for PR 29999 at commit [`f0e3de1`](https://github.com/apache/spark/commit/f0e3de1718e99c887833f230c77c17c3851f9fc7).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-712149566
**[Test build #129997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129997/testReport)** for PR 29999 at commit [`8df5231`](https://github.com/apache/spark/commit/8df52316a1bb4bbeab427dd165b23addfaa3b859).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728660048
**[Test build #131191 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131191/testReport)** for PR 29999 at commit [`f0e3de1`](https://github.com/apache/spark/commit/f0e3de1718e99c887833f230c77c17c3851f9fc7).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706858224
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34263/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728649075
**[Test build #131189 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131189/testReport)** for PR 29999 at commit [`97c1c73`](https://github.com/apache/spark/commit/97c1c7389e537f0d38f1b6a17bbe9ba70c9bc6ea).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r526618529
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,89 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotLikeAll: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ val exprValue = child.eval(input)
+ if (exprValue == null) {
+ null
+ } else {
+ val allMatched = if (isNotLikeAll) {
+ !cache.exists(p => p.matcher(exprValue.toString).matches())
+ } else {
+ cache.forall(p => p.matcher(exprValue.toString).matches())
+ }
+ if (allMatched && hasNull) {
+ null
+ } else {
+ allMatched
+ }
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val eval = child.genCode(ctx)
+ val patternClass = classOf[Pattern].getName
+ val javaDataType = CodeGenerator.javaType(child.dataType)
+ val pattern = ctx.freshName("pattern")
+ val allMatched = ctx.freshName("allMatched")
+ val valueIsNull = ctx.freshName("valueIsNull")
+ val valueArg = ctx.freshName("valueArg")
+ val patternCache = ctx.addReferenceObj("patternCache", cache.asJava)
+
+ val matchCode = if (isNotLikeAll) {
+ s"$pattern.matcher($valueArg.toString()).matches()"
+ } else {
+ s"!$pattern.matcher($valueArg.toString()).matches()"
+ }
+
+ ev.copy(code =
+ code"""
+ |${eval.code}
+ |boolean $allMatched = true;
+ |boolean $valueIsNull = false;
+ |if (${eval.isNull}) {
+ | $valueIsNull = true;
+ |} else {
+ | $javaDataType $valueArg = ${eval.value};
+ | for ($patternClass $pattern: $patternCache) {
Review comment:
Yeah! Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714333774
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34746/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r521135433
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,86 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[Any]
+
+ protected def isNotDefined: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ if (hasNull) {
+ null
Review comment:
```sql
spark-sql> select 'a' like all ('%a%', null);
NULL
spark-sql> select 'a' not like all ('%a%', null);
false
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-710019726
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34499/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-726638718
**[Test build #131050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131050/testReport)** for PR 29999 at commit [`7af8ffe`](https://github.com/apache/spark/commit/7af8ffe49fc02765a80a85faccaa7209fe8b9c57).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r503832375
##########
File path: sql/core/src/test/resources/sql-tests/inputs/regexp-functions.sql
##########
@@ -31,3 +31,13 @@ SELECT regexp_extract_all('1a 2b 14m', '(\\d+)([a-z]+)', 3);
SELECT regexp_extract_all('1a 2b 14m', '(\\d+)([a-z]+)', -1);
SELECT regexp_extract_all('1a 2b 14m', '(\\d+)?([a-z]+)', 1);
SELECT regexp_extract_all('a 2b 14m', '(\\d+)?([a-z]+)', 1);
+
+-- like_all
+SELECT like_all('foo', '%foo%', '%oo');
Review comment:
We already have a test file: https://github.com/apache/spark/blob/b10263b8e5106409467e0115968bbaf0b9141cd1/sql/core/src/test/resources/sql-tests/inputs/like-all.sql
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-712053647
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706919702
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725157090
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r526612173
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,89 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotLikeAll: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ val exprValue = child.eval(input)
+ if (exprValue == null) {
+ null
+ } else {
+ val allMatched = if (isNotLikeAll) {
+ !cache.exists(p => p.matcher(exprValue.toString).matches())
+ } else {
+ cache.forall(p => p.matcher(exprValue.toString).matches())
+ }
+ if (allMatched && hasNull) {
+ null
+ } else {
+ allMatched
+ }
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val eval = child.genCode(ctx)
+ val patternClass = classOf[Pattern].getName
+ val javaDataType = CodeGenerator.javaType(child.dataType)
+ val pattern = ctx.freshName("pattern")
+ val allMatched = ctx.freshName("allMatched")
+ val valueIsNull = ctx.freshName("valueIsNull")
+ val valueArg = ctx.freshName("valueArg")
+ val patternCache = ctx.addReferenceObj("patternCache", cache.asJava)
+
+ val matchCode = if (isNotLikeAll) {
+ s"$pattern.matcher($valueArg.toString()).matches()"
+ } else {
+ s"!$pattern.matcher($valueArg.toString()).matches()"
+ }
+
+ ev.copy(code =
+ code"""
+ |${eval.code}
+ |boolean $allMatched = true;
Review comment:
`notMatched`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708926914
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-726679865
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35656/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709734770
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706841427
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706545946
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129623/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709717092
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34477/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714215550
**[Test build #130125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130125/testReport)** for PR 29999 at commit [`55465b8`](https://github.com/apache/spark/commit/55465b8fcd5dbde93c23eae99d94fb877e9cb5f3).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714237121
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706545942
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730200947
**[Test build #131326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131326/testReport)** for PR 29999 at commit [`001eb38`](https://github.com/apache/spark/commit/001eb38f603267c6a6f4e1c25430b8900644f5b7).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725157121
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35506/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708874795
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r520337999
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +179,142 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes {
Review comment:
The current implementation requires the expression list to be foldable, including literal. In addition, in my earliest implementation, `nullSafeEval` also used the caching of each pattern. But through offline discussions with @cloud-fan , there is no need to do this. The current implementation of `doGenCode`, if it is all literal, has actually achieved the effect.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706869315
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725360400
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35529/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706858233
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714437145
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728677455
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35793/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725274237
**[Test build #130924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130924/testReport)** for PR 29999 at commit [`d039c33`](https://github.com/apache/spark/commit/d039c33de33ea4bab4cea3170925c0c4f92ca771).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714287382
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706855750
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34261/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706849694
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34261/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724622736
**[Test build #130860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130860/testReport)** for PR 29999 at commit [`1fc5214`](https://github.com/apache/spark/commit/1fc5214964a3a522f3cc0a1daf91ced342bb1b51).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708874781
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34387/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #29999:
URL: https://github.com/apache/spark/pull/29999
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708938150
**[Test build #129810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129810/testReport)** for PR 29999 at commit [`b770f92`](https://github.com/apache/spark/commit/b770f929594dd551a544fb6b0e5f9d4f2ddff7d4).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r524213551
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,90 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotDefined: Boolean
Review comment:
nit: `isNotLikeAll`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708911515
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708787077
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708912448
**[Test build #129800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129800/testReport)** for PR 29999 at commit [`60f01f4`](https://github.com/apache/spark/commit/60f01f4edfbd112ea085e118c6a50f024c8c4dff).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708910768
**[Test build #129780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129780/testReport)** for PR 29999 at commit [`de65829`](https://github.com/apache/spark/commit/de658290b417645d4dd8b91bc1f2febb747e1f3b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708843179
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724674796
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35472/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709925953
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706946167
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34276/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709984478
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129884/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r520305996
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +179,142 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes {
Review comment:
Sounds reasonable. If there are more than 14378 elements, most likely they are literals.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728758630
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131191/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706855768
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708930785
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34409/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708858723
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34385/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-707065749
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725272877
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725271482
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730202933
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730190816
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35930/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706840400
> I have the same impression like @maropu 's first comment. Could you answer his question please, @beliefer ?
Thanks for your remind.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708939230
**[Test build #129810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129810/testReport)** for PR 29999 at commit [`b770f92`](https://github.com/apache/spark/commit/b770f929594dd551a544fb6b0e5f9d4f2ddff7d4).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-715174110
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34798/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-712150902
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709864778
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r505202752
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,125 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
Review comment:
OK
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709988169
**[Test build #129894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129894/testReport)** for PR 29999 at commit [`f657ff0`](https://github.com/apache/spark/commit/f657ff0372f1cac48ea008a08c1cc7011f934d98).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725081708
cc @cloud-fan @wangyum
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724685139
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725157049
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35506/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724637618
**[Test build #130864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130864/testReport)** for PR 29999 at commit [`53406d3`](https://github.com/apache/spark/commit/53406d349a46dad7edf61e5eb2e27b11e92e508a).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724685139
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r504681811
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,195 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ val arrayOrStr = TypeCollection(ArrayType(StringType), StringType)
+ StringType +: Seq.fill(children.size - 1)(arrayOrStr)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = value.foldable && list.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def escape(v: String): String = StringUtils.escapeLikeRegex(v, '\\')
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
+
+ override def eval(input: InternalRow): Any = {
+ val evaluatedValue = value.eval(input)
+ if (evaluatedValue == null) {
+ null
+ } else {
+ list.foreach { e =>
+ val str = e.eval(input)
+ if (str == null) {
+ return null
+ }
+ val regex = Pattern.compile(escape(str.asInstanceOf[UTF8String].toString))
+ if(regex == null) {
+ return null
+ } else if (isNot && matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) {
+ return false
+ } else if (!isNot && !matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) {
+ return false
+ }
+ }
+ return true
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val patternClass = classOf[Pattern].getName
+ val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + ".escapeLikeRegex"
+ val javaDataType = CodeGenerator.javaType(value.dataType)
+ val valueGen = value.genCode(ctx)
+ val listGen = list.map(_.genCode(ctx))
+ val pattern = ctx.freshName("pattern")
+ val rightStr = ctx.freshName("rightStr")
+ val escapedEscapeChar = StringEscapeUtils.escapeJava("\\")
+ val hasNull = ctx.freshName("hasNull")
+ val matched = ctx.freshName("matched")
+ val valueArg = ctx.freshName("valueArg")
+ val listCode = listGen.map(x =>
+ s"""
+ |${x.code}
+ |if (${x.isNull}) {
+ | $hasNull = true; // ${ev.isNull} = true;
+ |} else if (!$hasNull && $matched) {
+ | String $rightStr = ${x.value}.toString();
+ | $patternClass $pattern =
+ | $patternClass.compile($escapeFunc($rightStr, '$escapedEscapeChar'));
+ | if ($isNot && $pattern.matcher($valueArg.toString()).matches()) {
+ | $matched = false;
+ | } else if (!$isNot && !$pattern.matcher($valueArg.toString()).matches()) {
+ | $matched = false;
+ | }
+ |}
+ """.stripMargin)
+
+ val resultType = CodeGenerator.javaType(dataType)
+ val codes = ctx.splitExpressionsWithCurrentInputs(
+ expressions = listCode,
+ funcName = "likeAll",
+ extraArguments = (javaDataType, valueArg) :: (CodeGenerator.JAVA_BOOLEAN, hasNull) ::
+ (resultType, matched) :: Nil,
+ returnType = resultType,
+ makeSplitFunction = body =>
+ s"""
+ |if (!$hasNull && $matched) {
+ | $body;
+ |}
+ """.stripMargin,
+ foldFunctions = _.map { funcCall =>
+ s"""
+ |if (!$hasNull && $matched) {
+ | $funcCall;
+ |}
+ """.stripMargin
+ }.mkString("\n"))
+ ev.copy(code =
+ code"""
+ |${valueGen.code}
+ |boolean $hasNull = false;
+ |boolean $matched = true;
+ |if (${valueGen.isNull}) {
+ | $hasNull = true;
+ |} else {
+ | $javaDataType $valueArg = ${valueGen.value};
+ | $codes
+ |}
+ |final boolean ${ev.isNull} = ($hasNull == true);
Review comment:
`hasNull` is already a boolean
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714352819
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708938150
**[Test build #129810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129810/testReport)** for PR 29999 at commit [`b770f92`](https://github.com/apache/spark/commit/b770f929594dd551a544fb6b0e5f9d4f2ddff7d4).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725411085
**[Test build #130924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130924/testReport)** for PR 29999 at commit [`d039c33`](https://github.com/apache/spark/commit/d039c33de33ea4bab4cea3170925c0c4f92ca771).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724800934
**[Test build #130860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130860/testReport)** for PR 29999 at commit [`1fc5214`](https://github.com/apache/spark/commit/1fc5214964a3a522f3cc0a1daf91ced342bb1b51).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant `
* `case class LikeAll(child: Expression, patterns: Seq[Any]) extends LikeAllBase `
* `case class NotLikeAll(child: Expression, patterns: Seq[Any]) extends LikeAllBase `
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r504677077
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,195 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ val arrayOrStr = TypeCollection(ArrayType(StringType), StringType)
+ StringType +: Seq.fill(children.size - 1)(arrayOrStr)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = value.foldable && list.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def escape(v: String): String = StringUtils.escapeLikeRegex(v, '\\')
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
+
+ override def eval(input: InternalRow): Any = {
+ val evaluatedValue = value.eval(input)
+ if (evaluatedValue == null) {
+ null
+ } else {
+ list.foreach { e =>
+ val str = e.eval(input)
+ if (str == null) {
+ return null
+ }
+ val regex = Pattern.compile(escape(str.asInstanceOf[UTF8String].toString))
+ if(regex == null) {
Review comment:
`SELECT company FROM like_all_table WHERE company LIKE ALL ('%oo%', null);`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r504671005
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,195 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ val arrayOrStr = TypeCollection(ArrayType(StringType), StringType)
+ StringType +: Seq.fill(children.size - 1)(arrayOrStr)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = value.foldable && list.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def escape(v: String): String = StringUtils.escapeLikeRegex(v, '\\')
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
+
+ override def eval(input: InternalRow): Any = {
+ val evaluatedValue = value.eval(input)
+ if (evaluatedValue == null) {
+ null
+ } else {
+ list.foreach { e =>
+ val str = e.eval(input)
+ if (str == null) {
+ return null
+ }
+ val regex = Pattern.compile(escape(str.asInstanceOf[UTF8String].toString))
+ if(regex == null) {
+ return null
+ } else if (isNot && matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) {
+ return false
+ } else if (!isNot && !matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) {
+ return false
+ }
+ }
+ return true
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val patternClass = classOf[Pattern].getName
+ val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + ".escapeLikeRegex"
+ val javaDataType = CodeGenerator.javaType(value.dataType)
+ val valueGen = value.genCode(ctx)
+ val listGen = list.map(_.genCode(ctx))
+ val pattern = ctx.freshName("pattern")
+ val rightStr = ctx.freshName("rightStr")
+ val escapedEscapeChar = StringEscapeUtils.escapeJava("\\")
+ val hasNull = ctx.freshName("hasNull")
+ val matched = ctx.freshName("matched")
+ val valueArg = ctx.freshName("valueArg")
+ val listCode = listGen.map(x =>
+ s"""
+ |${x.code}
+ |if (${x.isNull}) {
+ | $hasNull = true; // ${ev.isNull} = true;
+ |} else if (!$hasNull && $matched) {
+ | String $rightStr = ${x.value}.toString();
+ | $patternClass $pattern =
+ | $patternClass.compile($escapeFunc($rightStr, '$escapedEscapeChar'));
+ | if ($isNot && $pattern.matcher($valueArg.toString()).matches()) {
+ | $matched = false;
+ | } else if (!$isNot && !$pattern.matcher($valueArg.toString()).matches()) {
+ | $matched = false;
+ | }
+ |}
+ """.stripMargin)
+
+ val resultType = CodeGenerator.javaType(dataType)
+ val codes = ctx.splitExpressionsWithCurrentInputs(
+ expressions = listCode,
+ funcName = "likeAll",
+ extraArguments = (javaDataType, valueArg) :: (CodeGenerator.JAVA_BOOLEAN, hasNull) ::
+ (resultType, matched) :: Nil,
+ returnType = resultType,
+ makeSplitFunction = body =>
+ s"""
+ |if (!$hasNull && $matched) {
+ | $body;
+ |}
+ """.stripMargin,
+ foldFunctions = _.map { funcCall =>
+ s"""
+ |if (!$hasNull && $matched) {
+ | $funcCall;
+ |}
+ """.stripMargin
+ }.mkString("\n"))
+ ev.copy(code =
+ code"""
+ |${valueGen.code}
+ |boolean $hasNull = false;
+ |boolean $matched = true;
+ |if (${valueGen.isNull}) {
+ | $hasNull = true;
+ |} else {
+ | $javaDataType $valueArg = ${valueGen.value};
+ | $codes
+ |}
+ |final boolean ${ev.isNull} = ($hasNull == true);
+ |final boolean ${ev.value} = ($matched == true);
+ """.stripMargin)
+ }
+}
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+ usage = "_FUNC_(str, pattern1, pattern2, ...) - Returns true if `str` matches all the pattern string, " +
Review comment:
The doc is not needed since we don't register the function
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728677472
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728744819
**[Test build #131189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131189/testReport)** for PR 29999 at commit [`97c1c73`](https://github.com/apache/spark/commit/97c1c7389e537f0d38f1b6a17bbe9ba70c9bc6ea).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r521164688
##########
File path: sql/core/src/test/resources/sql-functions/sql-expression-schema.md
##########
@@ -346,4 +346,4 @@
| org.apache.spark.sql.catalyst.expressions.xml.XPathList | xpath | SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()') | struct<xpath(<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>, a/b/text()):array<string>> |
| org.apache.spark.sql.catalyst.expressions.xml.XPathLong | xpath_long | SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)') | struct<xpath_long(<a><b>1</b><b>2</b></a>, sum(a/b)):bigint> |
| org.apache.spark.sql.catalyst.expressions.xml.XPathShort | xpath_short | SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)') | struct<xpath_short(<a><b>1</b><b>2</b></a>, sum(a/b)):smallint> |
-| org.apache.spark.sql.catalyst.expressions.xml.XPathString | xpath_string | SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') | struct<xpath_string(<a><b>b</b><c>cc</c></a>, a/c):string> |
\ No newline at end of file
+| org.apache.spark.sql.catalyst.expressions.xml.XPathString | xpath_string | SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') | struct<xpath_string(<a><b>b</b><c>cc</c></a>, a/c):string> |
Review comment:
I tried revert it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r521109327
##########
File path: sql/core/src/test/resources/sql-functions/sql-expression-schema.md
##########
@@ -346,4 +346,4 @@
| org.apache.spark.sql.catalyst.expressions.xml.XPathList | xpath | SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()') | struct<xpath(<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>, a/b/text()):array<string>> |
| org.apache.spark.sql.catalyst.expressions.xml.XPathLong | xpath_long | SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)') | struct<xpath_long(<a><b>1</b><b>2</b></a>, sum(a/b)):bigint> |
| org.apache.spark.sql.catalyst.expressions.xml.XPathShort | xpath_short | SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)') | struct<xpath_short(<a><b>1</b><b>2</b></a>, sum(a/b)):smallint> |
-| org.apache.spark.sql.catalyst.expressions.xml.XPathString | xpath_string | SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') | struct<xpath_string(<a><b>b</b><c>cc</c></a>, a/c):string> |
\ No newline at end of file
+| org.apache.spark.sql.catalyst.expressions.xml.XPathString | xpath_string | SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') | struct<xpath_string(<a><b>b</b><c>cc</c></a>, a/c):string> |
Review comment:
Revert this change?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706869315
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r504673341
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,195 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ val arrayOrStr = TypeCollection(ArrayType(StringType), StringType)
+ StringType +: Seq.fill(children.size - 1)(arrayOrStr)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = value.foldable && list.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def escape(v: String): String = StringUtils.escapeLikeRegex(v, '\\')
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
+
+ override def eval(input: InternalRow): Any = {
+ val evaluatedValue = value.eval(input)
+ if (evaluatedValue == null) {
+ null
+ } else {
+ list.foreach { e =>
+ val str = e.eval(input)
+ if (str == null) {
+ return null
+ }
+ val regex = Pattern.compile(escape(str.asInstanceOf[UTF8String].toString))
+ if(regex == null) {
Review comment:
can this happen?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r502782869
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -1408,7 +1408,13 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
case Some(SqlBaseParser.ANY) | Some(SqlBaseParser.SOME) =>
getLikeQuantifierExprs(ctx.expression).reduceLeft(Or)
case Some(SqlBaseParser.ALL) =>
- getLikeQuantifierExprs(ctx.expression).reduceLeft(And)
+ if (ctx.expression.isEmpty) {
+ throw new ParseException("Expected something between '(' and ')'.", ctx)
+ }
+ ctx.NOT match {
+ case null => LikeAll(e, ctx.expression.asScala.map(expression))
Review comment:
Does this change disable the datasource pushdown for LIKE (e.g., StartsWith, EndsWith)? If so, we possibly get performance regression when reading datasources, I think.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-710125763
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r526617230
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,89 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotLikeAll: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ val exprValue = child.eval(input)
+ if (exprValue == null) {
+ null
+ } else {
+ val allMatched = if (isNotLikeAll) {
+ !cache.exists(p => p.matcher(exprValue.toString).matches())
+ } else {
+ cache.forall(p => p.matcher(exprValue.toString).matches())
+ }
+ if (allMatched && hasNull) {
+ null
+ } else {
+ allMatched
+ }
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val eval = child.genCode(ctx)
+ val patternClass = classOf[Pattern].getName
+ val javaDataType = CodeGenerator.javaType(child.dataType)
+ val pattern = ctx.freshName("pattern")
+ val allMatched = ctx.freshName("allMatched")
+ val valueIsNull = ctx.freshName("valueIsNull")
+ val valueArg = ctx.freshName("valueArg")
+ val patternCache = ctx.addReferenceObj("patternCache", cache.asJava)
+
+ val matchCode = if (isNotLikeAll) {
Review comment:
OK
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714280358
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709890593
**[Test build #129884 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129884/testReport)** for PR 29999 at commit [`f657ff0`](https://github.com/apache/spark/commit/f657ff0372f1cac48ea008a08c1cc7011f934d98).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708884791
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728669429
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35791/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r524214644
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,90 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotDefined: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ val exprValue = child.eval(input)
+ if (exprValue == null) {
+ null
+ } else {
+ val allMatched = if (isNotDefined) {
+ !cache.exists(p => p.matcher(exprValue.toString).matches())
+ } else {
+ cache.forall(p => p.matcher(exprValue.toString).matches())
+ }
+ if (allMatched && hasNull) {
+ null
+ } else {
+ allMatched
+ }
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val eval = child.genCode(ctx)
+ val patternClass = classOf[Pattern].getName
+ val javaDataType = CodeGenerator.javaType(child.dataType)
+ val pattern = ctx.freshName("pattern")
+ val allMatched = ctx.freshName("allMatched")
+ val valueIsNull = ctx.freshName("valueIsNull")
+ val valueArg = ctx.freshName("valueArg")
+ val patternHasNull = ctx.addReferenceObj("hasNull", hasNull)
Review comment:
It's a boolean constant. We can change the generated code based on it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708926088
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708916497
**[Test build #129802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129802/testReport)** for PR 29999 at commit [`c32f89b`](https://github.com/apache/spark/commit/c32f89b8ba34b7b689ea5d2712f55824c99ba6f0).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708912448
**[Test build #129800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129800/testReport)** for PR 29999 at commit [`60f01f4`](https://github.com/apache/spark/commit/60f01f4edfbd112ea085e118c6a50f024c8c4dff).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728669440
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706527357
**[Test build #129623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129623/testReport)** for PR 29999 at commit [`4163382`](https://github.com/apache/spark/commit/41633827583d6f0d91e0e48b781c25c95ec06765).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-707064823
**[Test build #129672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129672/testReport)** for PR 29999 at commit [`3e41cff`](https://github.com/apache/spark/commit/3e41cffb800e8e3f5a485021706f38a4fc73e07c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class LikeAll(children: Seq[Expression]) extends LikeAllBase `
* `case class NotLikeAll(children: Seq[Expression]) extends LikeAllBase `
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706919752
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708787105
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129778/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r504660905
##########
File path: sql/core/src/test/resources/sql-tests/inputs/regexp-functions.sql
##########
@@ -31,3 +31,13 @@ SELECT regexp_extract_all('1a 2b 14m', '(\\d+)([a-z]+)', 3);
SELECT regexp_extract_all('1a 2b 14m', '(\\d+)([a-z]+)', -1);
SELECT regexp_extract_all('1a 2b 14m', '(\\d+)?([a-z]+)', 1);
SELECT regexp_extract_all('a 2b 14m', '(\\d+)?([a-z]+)', 1);
+
+-- like_all
+SELECT like_all('foo', '%foo%', '%oo');
Review comment:
I have delete this change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r505202129
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,125 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ StringType +: Seq.fill(children.size - 1)(StringType)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
Review comment:
OK
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728669440
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-715204141
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r519773515
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +179,142 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes {
Review comment:
Could we make it only support `Literal`, for example:
```scala
case class LikeAll(child: Expression, isNotDefined: Boolean, seq: mutable.Buffer[Any])
extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
override def dataType: DataType = BooleanType
override def inputTypes: Seq[DataType] = StringType :: Nil
@transient private[this] lazy val hasNull: Boolean = seq.contains(null)
@transient private lazy val cachedPattern = seq.filterNot(_ == null)
.map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
override protected def nullSafeEval(input1: Any): Any = {
if (hasNull) {
false
} else {
val str = input1.asInstanceOf[UTF8String].toString
if (isNotDefined) {
!cachedPattern.exists(p => p.matcher(str).matches())
} else {
cachedPattern.forall(p => p.matcher(str).matches())
}
}
}
// TODO: codegen
}
```
```scala
val exps = ctx.expression.asScala.map(expression)
validate(exps.nonEmpty, "Expected something between '(' and ')'.", ctx)
if (exps.size > 10 && exps.forall(_.foldable)) {
LikeAll(e, isNotDefined, exps.map(_.eval(EmptyRow)))
} else {
exps.map(p => invertIfNotDefined(Like(e, p))).reduceLeft(And)
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724685114
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35472/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730400395
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730245197
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35936/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-726638718
**[Test build #131050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131050/testReport)** for PR 29999 at commit [`7af8ffe`](https://github.com/apache/spark/commit/7af8ffe49fc02765a80a85faccaa7209fe8b9c57).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706919391
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730257842
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725372003
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35529/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r521133744
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
##########
@@ -102,6 +102,8 @@ package object dsl {
def like(other: Expression, escapeChar: Char = '\\'): Expression =
Like(expr, other, escapeChar)
def rlike(other: Expression): Expression = RLike(expr, other)
+ def likeAll(others: Literal*): Expression = LikeAll(expr, others.map(_.eval(EmptyRow)))
Review comment:
OK
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714437108
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34758/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708944484
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-715204141
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r524848920
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,90 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotDefined: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ val exprValue = child.eval(input)
+ if (exprValue == null) {
+ null
+ } else {
+ val allMatched = if (isNotDefined) {
+ !cache.exists(p => p.matcher(exprValue.toString).matches())
+ } else {
+ cache.forall(p => p.matcher(exprValue.toString).matches())
+ }
+ if (allMatched && hasNull) {
+ null
+ } else {
+ allMatched
+ }
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val eval = child.genCode(ctx)
+ val patternClass = classOf[Pattern].getName
+ val javaDataType = CodeGenerator.javaType(child.dataType)
+ val pattern = ctx.freshName("pattern")
+ val allMatched = ctx.freshName("allMatched")
+ val valueIsNull = ctx.freshName("valueIsNull")
+ val valueArg = ctx.freshName("valueArg")
+ val patternHasNull = ctx.addReferenceObj("hasNull", hasNull)
Review comment:
OK
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r524217980
##########
File path: sql/core/src/test/resources/sql-functions/sql-expression-schema.md
##########
@@ -346,4 +346,4 @@
| org.apache.spark.sql.catalyst.expressions.xml.XPathList | xpath | SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()') | struct<xpath(<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>, a/b/text()):array<string>> |
| org.apache.spark.sql.catalyst.expressions.xml.XPathLong | xpath_long | SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)') | struct<xpath_long(<a><b>1</b><b>2</b></a>, sum(a/b)):bigint> |
| org.apache.spark.sql.catalyst.expressions.xml.XPathShort | xpath_short | SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)') | struct<xpath_short(<a><b>1</b><b>2</b></a>, sum(a/b)):smallint> |
-| org.apache.spark.sql.catalyst.expressions.xml.XPathString | xpath_string | SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') | struct<xpath_string(<a><b>b</b><c>cc</c></a>, a/c):string> |
\ No newline at end of file
+| org.apache.spark.sql.catalyst.expressions.xml.XPathString | xpath_string | SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') | struct<xpath_string(<a><b>b</b><c>cc</c></a>, a/c):string> |
Review comment:
What gets changed here?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728680562
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35794/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728677472
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706836333
I have the same impression like @maropu 's first comment. Could you answer his question please, @beliefer ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730398742
**[Test build #131331 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131331/testReport)** for PR 29999 at commit [`001eb38`](https://github.com/apache/spark/commit/001eb38f603267c6a6f4e1c25430b8900644f5b7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724656881
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35467/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728661500
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35791/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706538267
In the PR description, could you describe why the stack overflow can happen in the current approach and why the fix in this PR can avoid the error?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-726785967
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r526619223
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,89 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotLikeAll: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ val exprValue = child.eval(input)
+ if (exprValue == null) {
+ null
+ } else {
+ val allMatched = if (isNotLikeAll) {
+ !cache.exists(p => p.matcher(exprValue.toString).matches())
+ } else {
+ cache.forall(p => p.matcher(exprValue.toString).matches())
+ }
+ if (allMatched && hasNull) {
+ null
+ } else {
+ allMatched
+ }
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val eval = child.genCode(ctx)
+ val patternClass = classOf[Pattern].getName
+ val javaDataType = CodeGenerator.javaType(child.dataType)
+ val pattern = ctx.freshName("pattern")
+ val allMatched = ctx.freshName("allMatched")
+ val valueIsNull = ctx.freshName("valueIsNull")
+ val valueArg = ctx.freshName("valueArg")
+ val patternCache = ctx.addReferenceObj("patternCache", cache.asJava)
+
+ val matchCode = if (isNotLikeAll) {
+ s"$pattern.matcher($valueArg.toString()).matches()"
+ } else {
+ s"!$pattern.matcher($valueArg.toString()).matches()"
+ }
+
+ ev.copy(code =
+ code"""
+ |${eval.code}
+ |boolean $allMatched = true;
Review comment:
the code flow can be
```
boolean ${ev.isNull} = false;
boolean ${ev.value} = true;
if (${eval.isNull}) {
${ev.isNull} = true;
} else {
$javaDataType $valueArg = ${eval.value};
for ... {
if (notMatched) {
$ev.value = false;
break;
}
}
if (${ev.value} && hasNull) ${ev.isNull} = true;
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728769754
**[Test build #131211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131211/testReport)** for PR 29999 at commit [`f0e3de1`](https://github.com/apache/spark/commit/f0e3de1718e99c887833f230c77c17c3851f9fc7).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-711955225
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706545886
**[Test build #129623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129623/testReport)** for PR 29999 at commit [`4163382`](https://github.com/apache/spark/commit/41633827583d6f0d91e0e48b781c25c95ec06765).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant `
* `case class LikeAll(value: Expression, list: Seq[Expression]) extends LikeAllBase `
* `case class NotLikeAll(value: Expression, list: Seq[Expression]) extends LikeAllBase `
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r522716611
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,90 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[Any]
Review comment:
should be `Seq[UTF8String]`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714572132
**[Test build #130151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130151/testReport)** for PR 29999 at commit [`f160c64`](https://github.com/apache/spark/commit/f160c64b4c2bf8f07aaba09cffddb51fd727401c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714488547
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r503949480
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##########
@@ -344,6 +344,8 @@ object FunctionRegistry {
expression[Length]("length"),
expression[Levenshtein]("levenshtein"),
expression[Like]("like"),
+ expression[LikeAll]("like_all"),
+ expression[NotLikeAll]("not_like_all"),
Review comment:
I'd prefer not, unless they are common in other databases.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-715097050
**[Test build #130196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130196/testReport)** for PR 29999 at commit [`7b7120f`](https://github.com/apache/spark/commit/7b7120faaa0dcfd5e152cab135d1790a550f5fa9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714425678
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34758/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706919955
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129657/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725094540
**[Test build #130900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130900/testReport)** for PR 29999 at commit [`15bac5b`](https://github.com/apache/spark/commit/15bac5bfecb209ba7b6963d83423b659fbc5086d).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r545253629
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -216,6 +216,18 @@ object SQLConf {
"for using switch statements in InSet must be non-negative and less than or equal to 600")
.createWithDefault(400)
+ val OPTIMIZER_LIKE_ALL_CONVERSION_THRESHOLD =
+ buildConf("spark.sql.optimizer.likeAllConversionThreshold")
+ .internal()
+ .doc("Configure the maximum size of the pattern sequence in like all. Spark will convert " +
+ "the logical combination of like to avoid StackOverflowError. 200 is an empirical value " +
+ "that will not cause StackOverflowError.")
+ .version("3.1.0")
+ .intConf
+ .checkValue(threshold => threshold >= 0, "The maximum size of pattern sequence " +
+ "in like all must be non-negative")
+ .createWithDefault(200)
Review comment:
We have removed this config: https://github.com/beliefer/spark/commit/9273d4250ddd5e011487a5a942c1b4d0f0412f78#diff-13c5b65678b327277c68d17910ae93629801af00117a0e3da007afd95b6c6764L219
We will always use the new expression for LIKE ALL.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706545942
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725274237
**[Test build #130924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130924/testReport)** for PR 29999 at commit [`d039c33`](https://github.com/apache/spark/commit/d039c33de33ea4bab4cea3170925c0c4f92ca771).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r504682711
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,195 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ val arrayOrStr = TypeCollection(ArrayType(StringType), StringType)
+ StringType +: Seq.fill(children.size - 1)(arrayOrStr)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = value.foldable && list.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def escape(v: String): String = StringUtils.escapeLikeRegex(v, '\\')
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
+
+ override def eval(input: InternalRow): Any = {
+ val evaluatedValue = value.eval(input)
+ if (evaluatedValue == null) {
+ null
+ } else {
+ list.foreach { e =>
+ val str = e.eval(input)
+ if (str == null) {
+ return null
+ }
+ val regex = Pattern.compile(escape(str.asInstanceOf[UTF8String].toString))
+ if(regex == null) {
+ return null
+ } else if (isNot && matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) {
+ return false
+ } else if (!isNot && !matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) {
+ return false
+ }
+ }
+ return true
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val patternClass = classOf[Pattern].getName
+ val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + ".escapeLikeRegex"
+ val javaDataType = CodeGenerator.javaType(value.dataType)
+ val valueGen = value.genCode(ctx)
+ val listGen = list.map(_.genCode(ctx))
+ val pattern = ctx.freshName("pattern")
+ val rightStr = ctx.freshName("rightStr")
+ val escapedEscapeChar = StringEscapeUtils.escapeJava("\\")
+ val hasNull = ctx.freshName("hasNull")
+ val matched = ctx.freshName("matched")
+ val valueArg = ctx.freshName("valueArg")
+ val listCode = listGen.map(x =>
+ s"""
+ |${x.code}
+ |if (${x.isNull}) {
+ | $hasNull = true; // ${ev.isNull} = true;
+ |} else if (!$hasNull && $matched) {
+ | String $rightStr = ${x.value}.toString();
+ | $patternClass $pattern =
+ | $patternClass.compile($escapeFunc($rightStr, '$escapedEscapeChar'));
+ | if ($isNot && $pattern.matcher($valueArg.toString()).matches()) {
+ | $matched = false;
+ | } else if (!$isNot && !$pattern.matcher($valueArg.toString()).matches()) {
+ | $matched = false;
+ | }
+ |}
+ """.stripMargin)
+
+ val resultType = CodeGenerator.javaType(dataType)
+ val codes = ctx.splitExpressionsWithCurrentInputs(
+ expressions = listCode,
+ funcName = "likeAll",
+ extraArguments = (javaDataType, valueArg) :: (CodeGenerator.JAVA_BOOLEAN, hasNull) ::
+ (resultType, matched) :: Nil,
+ returnType = resultType,
+ makeSplitFunction = body =>
+ s"""
+ |if (!$hasNull && $matched) {
+ | $body;
+ |}
+ """.stripMargin,
+ foldFunctions = _.map { funcCall =>
+ s"""
+ |if (!$hasNull && $matched) {
+ | $funcCall;
+ |}
+ """.stripMargin
+ }.mkString("\n"))
+ ev.copy(code =
+ code"""
+ |${valueGen.code}
+ |boolean $hasNull = false;
+ |boolean $matched = true;
+ |if (${valueGen.isNull}) {
+ | $hasNull = true;
+ |} else {
+ | $javaDataType $valueArg = ${valueGen.value};
+ | $codes
+ |}
+ |final boolean ${ev.isNull} = ($hasNull == true);
+ |final boolean ${ev.value} = ($matched == true);
Review comment:
can we make the interpreted code path (`eval`) follow codegen? Similar code style can help people to review this PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r521107867
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,86 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[Any]
+
+ protected def isNotDefined: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ if (hasNull) {
+ null
Review comment:
`null` -> `false`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714370504
**[Test build #130151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130151/testReport)** for PR 29999 at commit [`f160c64`](https://github.com/apache/spark/commit/f160c64b4c2bf8f07aaba09cffddb51fd727401c).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-712053647
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708914866
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708787077
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725157090
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708935688
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708945114
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708867194
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-712229098
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708867201
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34385/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708944491
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34415/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-710019753
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714573850
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730169706
**[Test build #131326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131326/testReport)** for PR 29999 at commit [`001eb38`](https://github.com/apache/spark/commit/001eb38f603267c6a6f4e1c25430b8900644f5b7).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r526611514
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,89 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotLikeAll: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ val exprValue = child.eval(input)
+ if (exprValue == null) {
+ null
+ } else {
+ val allMatched = if (isNotLikeAll) {
Review comment:
to improve readability:
```
val matchFunc: Pattern => Booolean = if (isNotLikeAll) {
p => !p.matcher(exprValue.toString).matches()
} else {
p => p.matcher(exprValue.toString).matches()
}
if (cache.forall(matchFunc)) {
if (hasNull) null else true
} else {
false
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708787034
**[Test build #129778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129778/testReport)** for PR 29999 at commit [`369959f`](https://github.com/apache/spark/commit/369959f6c627004c99206fc6c9e252c9676b82a7).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706532052
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34227/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708913180
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706863638
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708913180
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-711955276
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r526619223
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,89 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotLikeAll: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ val exprValue = child.eval(input)
+ if (exprValue == null) {
+ null
+ } else {
+ val allMatched = if (isNotLikeAll) {
+ !cache.exists(p => p.matcher(exprValue.toString).matches())
+ } else {
+ cache.forall(p => p.matcher(exprValue.toString).matches())
+ }
+ if (allMatched && hasNull) {
+ null
+ } else {
+ allMatched
+ }
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val eval = child.genCode(ctx)
+ val patternClass = classOf[Pattern].getName
+ val javaDataType = CodeGenerator.javaType(child.dataType)
+ val pattern = ctx.freshName("pattern")
+ val allMatched = ctx.freshName("allMatched")
+ val valueIsNull = ctx.freshName("valueIsNull")
+ val valueArg = ctx.freshName("valueArg")
+ val patternCache = ctx.addReferenceObj("patternCache", cache.asJava)
+
+ val matchCode = if (isNotLikeAll) {
+ s"$pattern.matcher($valueArg.toString()).matches()"
+ } else {
+ s"!$pattern.matcher($valueArg.toString()).matches()"
+ }
+
+ ev.copy(code =
+ code"""
+ |${eval.code}
+ |boolean $allMatched = true;
Review comment:
the code flow can be
```
boolean ${ev.isNull} = false;
boolean ${ev.value} = true;
if (${eval.isNull}) {
${ev.isNull} = true;
} else {
$javaDataType $valueArg = ${eval.value};
for ... {
if (notMatched) $ev.value = false;
}
if (${ev.value} && hasNull) ${ev.isNull} = true;
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708698527
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708832273
**[Test build #129782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129782/testReport)** for PR 29999 at commit [`1754f0d`](https://github.com/apache/spark/commit/1754f0d3e234afbd69d408a27a2ca9dea11b4ba1).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-726784775
**[Test build #131050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131050/testReport)** for PR 29999 at commit [`7af8ffe`](https://github.com/apache/spark/commit/7af8ffe49fc02765a80a85faccaa7209fe8b9c57).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708874795
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725304059
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725255595
**[Test build #130915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130915/testReport)** for PR 29999 at commit [`d039c33`](https://github.com/apache/spark/commit/d039c33de33ea4bab4cea3170925c0c4f92ca771).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725122529
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35506/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706535126
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708943814
**[Test build #129782 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129782/testReport)** for PR 29999 at commit [`1754f0d`](https://github.com/apache/spark/commit/1754f0d3e234afbd69d408a27a2ca9dea11b4ba1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r521106614
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
##########
@@ -102,6 +102,8 @@ package object dsl {
def like(other: Expression, escapeChar: Char = '\\'): Expression =
Like(expr, other, escapeChar)
def rlike(other: Expression): Expression = RLike(expr, other)
+ def likeAll(others: Literal*): Expression = LikeAll(expr, others.map(_.eval(EmptyRow)))
Review comment:
`others: Literal*` -> `others: String*`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-725271492
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130915/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-707458123
@maropu If there have a lot of like, the reduceLeft will construct very deep tree. This will lead to unlimited function calls to increase the height of the thread stack.
```
at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
......
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706863628
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34264/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706535126
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714280358
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708930779
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730505334
thanks, merging to master!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-714370504
**[Test build #130151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130151/testReport)** for PR 29999 at commit [`f160c64`](https://github.com/apache/spark/commit/f160c64b4c2bf8f07aaba09cffddb51fd727401c).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r505212166
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,125 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ StringType +: Seq.fill(children.size - 1)(StringType)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
+
+ override def eval(input: InternalRow): Any = {
+ val evaluatedValue = value.eval(input)
+ if (evaluatedValue == null) {
+ null
+ } else {
+ var hasNull = false
+ var match = true
+ list.foreach { e =>
+ val str = e.eval(input)
Review comment:
Yes
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730202933
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r524214644
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,90 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotDefined: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ val exprValue = child.eval(input)
+ if (exprValue == null) {
+ null
+ } else {
+ val allMatched = if (isNotDefined) {
+ !cache.exists(p => p.matcher(exprValue.toString).matches())
+ } else {
+ cache.forall(p => p.matcher(exprValue.toString).matches())
+ }
+ if (allMatched && hasNull) {
+ null
+ } else {
+ allMatched
+ }
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val eval = child.genCode(ctx)
+ val patternClass = classOf[Pattern].getName
+ val javaDataType = CodeGenerator.javaType(child.dataType)
+ val pattern = ctx.freshName("pattern")
+ val allMatched = ctx.freshName("allMatched")
+ val valueIsNull = ctx.freshName("valueIsNull")
+ val valueArg = ctx.freshName("valueArg")
+ val patternHasNull = ctx.addReferenceObj("hasNull", hasNull)
Review comment:
It's a boolean contant. We can change the generated code based on it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706960737
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724802450
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708601868
**[Test build #129757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129757/testReport)** for PR 29999 at commit [`d841b54`](https://github.com/apache/spark/commit/d841b54007d36963ede98a3745d4dd69c8f65c3e).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-730583151
@cloud-fan This is causing failures in scala-2.13 build
See [this](https://github.com/apache/spark/pull/30164/checks?check_run_id=1425957338) for example.
+CC @dongjoon-hyun, @srowen
I believe @sunchao's PR is attempting to address it [here](https://github.com/apache/spark/pull/30431)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706851744
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34263/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706538725
One more question; does this PR approach has the same performance with the current one in case of the small number of elements in `LIKE ALL`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-724637618
**[Test build #130864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130864/testReport)** for PR 29999 at commit [`53406d3`](https://github.com/apache/spark/commit/53406d349a46dad7edf61e5eb2e27b11e92e508a).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-708926967
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129807/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728649075
**[Test build #131189 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131189/testReport)** for PR 29999 at commit [`97c1c73`](https://github.com/apache/spark/commit/97c1c7389e537f0d38f1b6a17bbe9ba70c9bc6ea).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709864803
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129872/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r505196281
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,125 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ StringType +: Seq.fill(children.size - 1)(StringType)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
+
+ override def eval(input: InternalRow): Any = {
+ val evaluatedValue = value.eval(input)
+ if (evaluatedValue == null) {
+ null
+ } else {
+ var hasNull = false
+ var match = true
Review comment:
`matched`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-709863988
**[Test build #129872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129872/testReport)** for PR 29999 at commit [`be5eb8a`](https://github.com/apache/spark/commit/be5eb8a1f092e15c941d39d517284aed67de72c9).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum edited a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
wangyum edited a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-707634498
@maropu We can reproduce the `java.lang.StackOverflowError` in this way:
```scala
spark.sql("create table SPARK_33045(id string) using parquet")
val values = Range(1, 10000)
spark.sql(s"select * from SPARK_33045 where id like all (${values.mkString(", ")})").show
```
This is because we rewrite like all/any to like:
```scala
spark.sql(s"select * from SPARK_33045 where ${values.map(i => s"id like $i").mkString(" and ")}").show
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-706919007
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728834744
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r505098317
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -176,6 +177,195 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+abstract class LikeAllBase extends Expression with ImplicitCastInputTypes with NullIntolerant {
+ def value: Expression = children.head
+ def list: Seq[Expression] = children.tail
+ def isNot: Boolean
+
+ override def inputTypes: Seq[AbstractDataType] = {
+ val arrayOrStr = TypeCollection(ArrayType(StringType), StringType)
+ StringType +: Seq.fill(children.size - 1)(arrayOrStr)
+ }
+
+ override def dataType: DataType = BooleanType
+
+ override def foldable: Boolean = value.foldable && list.forall(_.foldable)
+
+ override def nullable: Boolean = true
+
+ def escape(v: String): String = StringUtils.escapeLikeRegex(v, '\\')
+
+ def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()
+
+ override def eval(input: InternalRow): Any = {
+ val evaluatedValue = value.eval(input)
+ if (evaluatedValue == null) {
+ null
+ } else {
+ list.foreach { e =>
+ val str = e.eval(input)
+ if (str == null) {
+ return null
+ }
+ val regex = Pattern.compile(escape(str.asInstanceOf[UTF8String].toString))
+ if(regex == null) {
+ return null
+ } else if (isNot && matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) {
+ return false
+ } else if (!isNot && !matches(regex, evaluatedValue.asInstanceOf[UTF8String].toString)) {
+ return false
+ }
+ }
+ return true
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val patternClass = classOf[Pattern].getName
+ val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") + ".escapeLikeRegex"
+ val javaDataType = CodeGenerator.javaType(value.dataType)
+ val valueGen = value.genCode(ctx)
+ val listGen = list.map(_.genCode(ctx))
+ val pattern = ctx.freshName("pattern")
+ val rightStr = ctx.freshName("rightStr")
+ val escapedEscapeChar = StringEscapeUtils.escapeJava("\\")
+ val hasNull = ctx.freshName("hasNull")
+ val matched = ctx.freshName("matched")
+ val valueArg = ctx.freshName("valueArg")
+ val listCode = listGen.map(x =>
+ s"""
+ |${x.code}
+ |if (${x.isNull}) {
+ | $hasNull = true; // ${ev.isNull} = true;
+ |} else if (!$hasNull && $matched) {
+ | String $rightStr = ${x.value}.toString();
+ | $patternClass $pattern =
+ | $patternClass.compile($escapeFunc($rightStr, '$escapedEscapeChar'));
+ | if ($isNot && $pattern.matcher($valueArg.toString()).matches()) {
+ | $matched = false;
+ | } else if (!$isNot && !$pattern.matcher($valueArg.toString()).matches()) {
+ | $matched = false;
+ | }
+ |}
+ """.stripMargin)
+
+ val resultType = CodeGenerator.javaType(dataType)
+ val codes = ctx.splitExpressionsWithCurrentInputs(
+ expressions = listCode,
+ funcName = "likeAll",
+ extraArguments = (javaDataType, valueArg) :: (CodeGenerator.JAVA_BOOLEAN, hasNull) ::
+ (resultType, matched) :: Nil,
+ returnType = resultType,
+ makeSplitFunction = body =>
+ s"""
+ |if (!$hasNull && $matched) {
+ | $body;
+ |}
+ """.stripMargin,
+ foldFunctions = _.map { funcCall =>
+ s"""
+ |if (!$hasNull && $matched) {
+ | $funcCall;
+ |}
+ """.stripMargin
+ }.mkString("\n"))
+ ev.copy(code =
+ code"""
+ |${valueGen.code}
+ |boolean $hasNull = false;
+ |boolean $matched = true;
+ |if (${valueGen.isNull}) {
+ | $hasNull = true;
+ |} else {
+ | $javaDataType $valueArg = ${valueGen.value};
+ | $codes
+ |}
+ |final boolean ${ev.isNull} = ($hasNull == true);
Review comment:
Yeah!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728912508
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29999:
URL: https://github.com/apache/spark/pull/29999#discussion_r526626891
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -178,6 +180,89 @@ case class Like(left: Expression, right: Expression, escapeChar: Char)
}
}
+/**
+ * Optimized version of LIKE ALL, when all pattern values are literal.
+ */
+abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
+
+ protected def patterns: Seq[UTF8String]
+
+ protected def isNotLikeAll: Boolean
+
+ override def inputTypes: Seq[DataType] = StringType :: Nil
+
+ override def dataType: DataType = BooleanType
+
+ override def nullable: Boolean = true
+
+ private lazy val hasNull: Boolean = patterns.contains(null)
+
+ private lazy val cache = patterns.filterNot(_ == null)
+ .map(s => Pattern.compile(StringUtils.escapeLikeRegex(s.toString, '\\')))
+
+ override def eval(input: InternalRow): Any = {
+ val exprValue = child.eval(input)
+ if (exprValue == null) {
+ null
+ } else {
+ val allMatched = if (isNotLikeAll) {
+ !cache.exists(p => p.matcher(exprValue.toString).matches())
+ } else {
+ cache.forall(p => p.matcher(exprValue.toString).matches())
+ }
+ if (allMatched && hasNull) {
+ null
+ } else {
+ allMatched
+ }
+ }
+ }
+
+ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val eval = child.genCode(ctx)
+ val patternClass = classOf[Pattern].getName
+ val javaDataType = CodeGenerator.javaType(child.dataType)
+ val pattern = ctx.freshName("pattern")
+ val allMatched = ctx.freshName("allMatched")
+ val valueIsNull = ctx.freshName("valueIsNull")
+ val valueArg = ctx.freshName("valueArg")
+ val patternCache = ctx.addReferenceObj("patternCache", cache.asJava)
+
+ val matchCode = if (isNotLikeAll) {
+ s"$pattern.matcher($valueArg.toString()).matches()"
+ } else {
+ s"!$pattern.matcher($valueArg.toString()).matches()"
+ }
+
+ ev.copy(code =
+ code"""
+ |${eval.code}
+ |boolean $allMatched = true;
Review comment:
I learned more!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29999:
URL: https://github.com/apache/spark/pull/29999#issuecomment-728680575
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org