You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/08/09 08:27:35 UTC
[GitHub] spark pull request #22052: [SPARK-25068][SQL] Add exists function.
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/22052
[SPARK-25068][SQL] Add exists function.
## What changes were proposed in this pull request?
This pr adds `exists` function which tests whether a predicate holds for one or more elements in the array.
```sql
> SELECT exists(array(1, 2, 3), x -> x % 2 == 0);
true
```
## How was this patch tested?
Added tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ueshin/apache-spark issues/SPARK-25068/exists
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22052.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22052
----
commit 9b47b027d572dd90149a7ffd928a9ede256dae29
Author: Takuya UESHIN <ue...@...>
Date: 2018-08-08T09:08:16Z
Add exists function.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22052
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22052
**[Test build #94483 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94483/testReport)** for PR 22052 at commit [`9b47b02`](https://github.com/apache/spark/commit/9b47b027d572dd90149a7ffd928a9ede256dae29).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22052
**[Test build #94510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94510/testReport)** for PR 22052 at commit [`85b356e`](https://github.com/apache/spark/commit/85b356eab4a5be6529fb7409bb6e459c59cf5056).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22052: [SPARK-25068][SQL] Add exists function.
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22052#discussion_r208970323
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
@@ -356,6 +356,52 @@ case class ArrayFilter(
override def prettyName: String = "filter"
}
+/**
+ * Tests whether a predicate holds for one or more elements in the array.
+ */
+@ExpressionDescription(usage =
+ "_FUNC_(expr, pred) - Tests whether a predicate holds for one or more elements in the array.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array(1, 2, 3), x -> x % 2 == 0);
+ true
+ """,
+ since = "2.4.0")
+case class ArrayExists(
+ input: Expression,
+ function: Expression)
+ extends ArrayBasedSimpleHigherOrderFunction with CodegenFallback {
+
+ override def nullable: Boolean = input.nullable
+
+ override def dataType: DataType = BooleanType
+
+ override def expectingFunctionType: AbstractDataType = BooleanType
+
+ override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): ArrayExists = {
+ val elem = HigherOrderFunction.arrayArgumentType(input.dataType)
+ copy(function = f(function, elem :: Nil))
+ }
+
+ @transient lazy val LambdaFunction(_, Seq(elementVar: NamedLambdaVariable), _) = function
+
+ override def nullSafeEval(inputRow: InternalRow, value: Any): Any = {
+ val arr = value.asInstanceOf[ArrayData]
+ val f = functionForEval
+ var i = 0
+ while (i < arr.numElements) {
+ elementVar.value.set(arr.get(i, elementVar.dataType))
+ if (f.eval(inputRow).asInstanceOf[Boolean]) {
+ return true
--- End diff --
shall we use a `var exists = false` to keep the result, and stop the loop when result is true `while (i < arr.numElements & !exists)`?
IIUC return in Scala is implemented by throwing an exception, which may have performance issue.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22052
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22052
**[Test build #94483 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94483/testReport)** for PR 22052 at commit [`9b47b02`](https://github.com/apache/spark/commit/9b47b027d572dd90149a7ffd928a9ede256dae29).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class ArrayExists(`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22052
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94483/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/22052
cc @hvanhovell @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22052
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22052
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2007/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22052
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22052
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94510/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22052
**[Test build #94510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94510/testReport)** for PR 22052 at commit [`85b356e`](https://github.com/apache/spark/commit/85b356eab4a5be6529fb7409bb6e459c59cf5056).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22052
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22052
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1991/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22052: [SPARK-25068][SQL] Add exists function.
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22052
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org