You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/08/09 08:27:35 UTC

[GitHub] spark pull request #22052: [SPARK-25068][SQL] Add exists function.

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/22052

    [SPARK-25068][SQL] Add exists function.

    ## What changes were proposed in this pull request?
    
    This pr adds `exists` function which tests whether a predicate holds for one or more elements in the array.
    
    ```sql
    > SELECT exists(array(1, 2, 3), x -> x % 2 == 0);
     true
    ```
    
    ## How was this patch tested?
    
    Added tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-25068/exists

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22052.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22052
    
----
commit 9b47b027d572dd90149a7ffd928a9ede256dae29
Author: Takuya UESHIN <ue...@...>
Date:   2018-08-08T09:08:16Z

    Add exists function.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    **[Test build #94483 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94483/testReport)** for PR 22052 at commit [`9b47b02`](https://github.com/apache/spark/commit/9b47b027d572dd90149a7ffd928a9ede256dae29).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    **[Test build #94510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94510/testReport)** for PR 22052 at commit [`85b356e`](https://github.com/apache/spark/commit/85b356eab4a5be6529fb7409bb6e459c59cf5056).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22052: [SPARK-25068][SQL] Add exists function.

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22052#discussion_r208970323
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -356,6 +356,52 @@ case class ArrayFilter(
       override def prettyName: String = "filter"
     }
     
    +/**
    + * Tests whether a predicate holds for one or more elements in the array.
    + */
    +@ExpressionDescription(usage =
    +  "_FUNC_(expr, pred) - Tests whether a predicate holds for one or more elements in the array.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array(1, 2, 3), x -> x % 2 == 0);
    +       true
    +  """,
    +  since = "2.4.0")
    +case class ArrayExists(
    +    input: Expression,
    +    function: Expression)
    +  extends ArrayBasedSimpleHigherOrderFunction with CodegenFallback {
    +
    +  override def nullable: Boolean = input.nullable
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def expectingFunctionType: AbstractDataType = BooleanType
    +
    +  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): ArrayExists = {
    +    val elem = HigherOrderFunction.arrayArgumentType(input.dataType)
    +    copy(function = f(function, elem :: Nil))
    +  }
    +
    +  @transient lazy val LambdaFunction(_, Seq(elementVar: NamedLambdaVariable), _) = function
    +
    +  override def nullSafeEval(inputRow: InternalRow, value: Any): Any = {
    +    val arr = value.asInstanceOf[ArrayData]
    +    val f = functionForEval
    +    var i = 0
    +    while (i < arr.numElements) {
    +      elementVar.value.set(arr.get(i, elementVar.dataType))
    +      if (f.eval(inputRow).asInstanceOf[Boolean]) {
    +        return true
    --- End diff --
    
    shall we use a `var exists = false` to keep the result, and stop the loop when result is true `while (i < arr.numElements & !exists)`?
    
    IIUC return in Scala is implemented by throwing an exception, which may have performance issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    **[Test build #94483 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94483/testReport)** for PR 22052 at commit [`9b47b02`](https://github.com/apache/spark/commit/9b47b027d572dd90149a7ffd928a9ede256dae29).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayExists(`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94483/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    cc @hvanhovell @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2007/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94510/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    **[Test build #94510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94510/testReport)** for PR 22052 at commit [`85b356e`](https://github.com/apache/spark/commit/85b356eab4a5be6529fb7409bb6e459c59cf5056).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22052: [SPARK-25068][SQL] Add exists function.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22052
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1991/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22052: [SPARK-25068][SQL] Add exists function.

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22052


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org