You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by dilipbiswal <gi...@git.apache.org> on 2018/10/23 16:49:49 UTC

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/22809

    [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) aggregates

    ## What changes were proposed in this pull request?
    
    Implements Every, Some, Any aggregates in SQL. These new aggregate expressions are analyzed in normal way and rewritten to equivalent existing aggregate expressions in the optimizer.
    
    Every(x) => Min(x)  where x is boolean.
    Some(x) => Max(x) where x is boolean.
    
    Any is a synonym for Some.
    SQL
    ```
    explain extended select every(v) from test_agg group by k;
    ```
    Plan : 
    ```
    == Parsed Logical Plan ==
    'Aggregate ['k], [unresolvedalias('every('v), None)]
    +- 'UnresolvedRelation `test_agg`
    
    == Analyzed Logical Plan ==
    every(v): boolean
    Aggregate [k#0], [every(v#1) AS every(v)#5]
    +- SubqueryAlias `test_agg`
       +- Project [k#0, v#1]
          +- SubqueryAlias `test_agg`
             +- LocalRelation [k#0, v#1]
    
    == Optimized Logical Plan ==
    Aggregate [k#0], [min(v#1) AS every(v)#5]
    +- LocalRelation [k#0, v#1]
    
    == Physical Plan ==
    *(2) HashAggregate(keys=[k#0], functions=[min(v#1)], output=[every(v)#5])
    +- Exchange hashpartitioning(k#0, 200)
       +- *(1) HashAggregate(keys=[k#0], functions=[partial_min(v#1)], output=[k#0, min#7])
          +- LocalTableScan [k#0, v#1]
    Time taken: 0.512 seconds, Fetched 1 row(s)
    ```
    
    ## How was this patch tested?
    Added tests in SQLQueryTestSuite, DataframeAggregateSuite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark SPARK-19851-specific-rewrite

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22809.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22809
    
----
commit e6c5c84ea189b1fdbac8408594c880950b6b7398
Author: Dilip Biswal <db...@...>
Date:   2018-10-16T06:23:41Z

    tests

commit b793d06cb937db300c78e4eb4cd143c385419e57
Author: Dilip Biswal <db...@...>
Date:   2018-10-23T16:43:04Z

    Code changes

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #98035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98035/testReport)** for PR 22809 at commit [`2bc9965`](https://github.com/apache/spark/commit/2bc996515ec1947b8a1b82f942bd6ebecc473277).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228236882
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    +  extends UnevaluableAggrgate with ImplicitCastInputTypes {
    +
    +  override def children: Seq[Expression] = arg :: Nil
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    arg.dataType match {
    +      case dt if dt != BooleanType =>
    +        TypeCheckResult.TypeCheckFailure(s"Input to function '$prettyName' should have been " +
    +          s"${BooleanType.simpleString}, but it's [${arg.dataType.catalogString}].")
    +      case _ => TypeCheckResult.TypeCheckSuccess
    +    }
    +  }
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    --- End diff --
    
    just to be clear, here I am not talking about `@Since`, I am talking about `since` as parameter of `@ExpressionDescription`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r227901365
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by.sql ---
    @@ -80,3 +80,69 @@ SELECT 1 FROM range(10) HAVING true;
     SELECT 1 FROM range(10) HAVING MAX(id) > 0;
     
     SELECT id FROM range(10) HAVING id > 0;
    +
    +-- Test data
    +CREATE OR REPLACE TEMPORARY VIEW test_agg AS SELECT * FROM VALUES
    +  (1, true), (1, false),
    +  (2, true),
    +  (3, false), (3, null),
    +  (4, null), (4, null),
    +  (5, null), (5, true), (5, false) AS test_agg(k, v);
    +
    +-- empty table
    +SELECT every(v), some(v), any(v) FROM test_agg WHERE 1 = 0;
    +
    +-- all null values
    +SELECT every(v), some(v), any(v) FROM test_agg WHERE k = 4;
    +
    +-- aggregates are null Filtering
    +SELECT every(v), some(v), any(v) FROM test_agg WHERE k = 5;
    +
    +-- group by
    +SELECT k, every(v), some(v), any(v) FROM test_agg GROUP BY k;
    +
    +-- having
    +SELECT k, every(v) FROM test_agg GROUP BY k HAVING every(v) = false;
    +SELECT k, every(v) FROM test_agg GROUP BY k HAVING every(v) IS NULL;
    +
    +-- basic subquery path to make sure rewrite happens in both parent and child plans.
    +SELECT k,
    +       Every(v) AS every
    +FROM   test_agg
    +WHERE  k = 2
    +       AND v IN (SELECT Any(v)
    +                 FROM   test_agg
    +                 WHERE  k = 1)
    +GROUP  BY k;
    +
    +-- basic subquery path to make sure rewrite happens in both parent and child plans.
    +SELECT k,
    +       Every(v) AS every
    +FROM   test_agg
    +WHERE  k = 2
    +       AND v IN (SELECT Every(v)
    +                 FROM   test_agg
    +                 WHERE  k = 1)
    +GROUP  BY k;
    +
    +-- input type checking Int
    +SELECT every(1);
    +
    +-- input type checking Short
    +SELECT some(1S);
    +
    +-- input type checking Long
    +SELECT any(1L);
    +
    +-- input type checking String
    +SELECT every("true");
    +
    +-- every/some/any aggregates are not supported as windows expression.
    +SELECT k, v, every(v) OVER (PARTITION BY k ORDER BY v) FROM test_agg;
    --- End diff --
    
    @cloud-fan here are the a few window tests. (fyi)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4449/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r232564281
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    +  extends UnevaluableAggrgate with ImplicitCastInputTypes {
    +
    +  override def children: Seq[Expression] = arg :: Nil
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    arg.dataType match {
    +      case dt if dt != BooleanType =>
    +        TypeCheckResult.TypeCheckFailure(s"Input to function '$prettyName' should have been " +
    +          s"${BooleanType.simpleString}, but it's [${arg.dataType.catalogString}].")
    +      case _ => TypeCheckResult.TypeCheckSuccess
    +    }
    +  }
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    --- End diff --
    
    (btw, let's add `since` at `ExpressionDescription` wherever possible .. )


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228143607
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -282,6 +283,31 @@ trait RuntimeReplaceable extends UnaryExpression with Unevaluable {
       override lazy val canonicalized: Expression = child.canonicalized
     }
     
    +/**
    + * An aggregate expression that gets rewritten (currently by the optimizer) into a
    + * different aggregate expression for evaluation. This is mainly used to provide compatibility
    + * with other databases. For example, we use this to support every, any/some aggregates by rewriting
    + * them with Min and Max respectively.
    + */
    +trait UnevaluableAggrgate extends DeclarativeAggregate {
    --- End diff --
    
    typo: `UnevaluableAggrgate` -> `UnevaluableAggregate`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228143956
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    --- End diff --
    
    can we change this to something like `UnevaluableBooleanAggBase` and make also `EveryAgg` extend this, in order to avoid code duplication?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4492/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228232332
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    --- End diff --
    
    maybe we can move it close to `UnevaluableAggrgate`. @cloud-fan @dilipbiswal WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228144250
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala ---
    @@ -21,24 +21,32 @@ import scala.collection.mutable
     
     import org.apache.spark.sql.catalyst.catalog.SessionCatalog
     import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.expressions.aggregate._
     import org.apache.spark.sql.catalyst.plans.logical._
     import org.apache.spark.sql.catalyst.rules._
     import org.apache.spark.sql.catalyst.util.DateTimeUtils
     import org.apache.spark.sql.types._
     
     
     /**
    - * Finds all [[RuntimeReplaceable]] expressions and replace them with the expressions that can
    - * be evaluated. This is mainly used to provide compatibility with other databases.
    - * For example, we use this to support "nvl" by replacing it with "coalesce".
    + * Finds all the expressions that are unevaluable and replace/rewrite them with semantically
    + * equivalent expressions that can be evaluated. Currently we replace two kinds of expressions :
    + * 1) [[RuntimeReplaceable]] expressions
    + * 2) [[UnevaluableAggrgate]] expressions such as Every, Some, Any
    + * This is mainly used to provide compatibility with other databases.
    + * Few examples are :
    + *   we use this to support "nvl" by replacing it with "coalesce".
    + *   we use this to replace Every and Any with Min and Max respectively.
      */
     object ReplaceExpressions extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
         case e: RuntimeReplaceable => e.child
    +    case SomeAgg(arg) => Max(arg)
    +    case AnyAgg(arg) => Max(arg)
    +    case EveryAgg(arg) => Min(arg)
       }
     }
     
    -
    --- End diff --
    
    nit: unneded change


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228644504
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.aggregate
    +
    +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types._
    +
    +abstract class UnevaluableBooleanAggBase(arg: Expression)
    --- End diff --
    
    @mgaido91 I tried it [here](https://github.com/apache/spark/compare/master...dilipbiswal:SPARK-19851-rewrite?expand=1). I had trouble getting it to work for window expressions. Thats why @cloud-fan suggested to try the current approach in this [comment](https://github.com/apache/spark/pull/22797#issuecomment-432106729)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228633284
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.aggregate
    +
    +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types._
    +
    +abstract class UnevaluableBooleanAggBase(arg: Expression)
    --- End diff --
    
    why don't we reuse `RuntimeReplaceable`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98035/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #97986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97986/testReport)** for PR 22809 at commit [`9e194b5`](https://github.com/apache/spark/commit/9e194b57cb30f022392c3fb01959984b9fc17f27).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4461/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228635201
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -282,6 +283,31 @@ trait RuntimeReplaceable extends UnaryExpression with Unevaluable {
       override lazy val canonicalized: Expression = child.canonicalized
     }
     
    +/**
    + * An aggregate expression that gets rewritten (currently by the optimizer) into a
    + * different aggregate expression for evaluation. This is mainly used to provide compatibility
    + * with other databases. For example, we use this to support every, any/some aggregates by rewriting
    + * them with Min and Max respectively.
    + */
    +trait UnevaluableAggregate extends DeclarativeAggregate {
    +
    +  override def nullable: Boolean = true
    --- End diff --
    
    shouldn't this be nullable only if the incoming expression is?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228144382
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala ---
    @@ -144,6 +144,8 @@ class ExpressionTypeCheckingSuite extends SparkFunSuite {
         assertSuccess(Sum('stringField))
         assertSuccess(Average('stringField))
         assertSuccess(Min('arrayField))
    +    assertSuccess(new EveryAgg('booleanField))
    +    assertSuccess(new AnyAgg('booleanField))
    --- End diff --
    
    shall we add also `SomeAgg`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    @cloud-fan @mgaido91 I have incorporated the comments. Could we please check if things look okay now ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228709483
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.aggregate
    +
    +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types._
    +
    +abstract class UnevaluableBooleanAggBase(arg: Expression)
    --- End diff --
    
    We can leave a TODO saying that we should create a framework to replace aggregate functions, but I think the current patch is good enough for these 3 functions, and I'm not aware of more functions like them that we need to deal with.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #98033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98033/testReport)** for PR 22809 at commit [`08999f9`](https://github.com/apache/spark/commit/08999f98a3af6c7a30c545cec1c3657498fb39c0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98139/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4560/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r227881077
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala ---
    @@ -38,6 +39,18 @@ object ReplaceExpressions extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    --- End diff --
    
    @cloud-fan We could also add these transformations in `ReplaceExpressions` rule and not require a dedicated rule (fyi).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4490/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228230112
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    +  extends UnevaluableAggrgate with ImplicitCastInputTypes {
    +
    +  override def children: Seq[Expression] = arg :: Nil
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    arg.dataType match {
    +      case dt if dt != BooleanType =>
    +        TypeCheckResult.TypeCheckFailure(s"Input to function '$prettyName' should have been " +
    +          s"${BooleanType.simpleString}, but it's [${arg.dataType.catalogString}].")
    +      case _ => TypeCheckResult.TypeCheckSuccess
    +    }
    +  }
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    --- End diff --
    
    I think the point is that it was not available until 2.3, so earlier methods don't have it. Am I missing something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228635413
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.aggregate
    +
    +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types._
    +
    +abstract class UnevaluableBooleanAggBase(arg: Expression)
    --- End diff --
    
    why it doesn't work? Sorry if the question is dumb, but I can't see the problem in using it here, probably I am missing something.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r227899342
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala ---
    @@ -38,6 +39,18 @@ object ReplaceExpressions extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    --- End diff --
    
    ah this sounds better!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #97999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97999/testReport)** for PR 22809 at commit [`6abf844`](https://github.com/apache/spark/commit/6abf844f728471b737a047a81c3f287060506473).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228238276
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    +  extends UnevaluableAggrgate with ImplicitCastInputTypes {
    +
    +  override def children: Seq[Expression] = arg :: Nil
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    arg.dataType match {
    +      case dt if dt != BooleanType =>
    +        TypeCheckResult.TypeCheckFailure(s"Input to function '$prettyName' should have been " +
    +          s"${BooleanType.simpleString}, but it's [${arg.dataType.catalogString}].")
    +      case _ => TypeCheckResult.TypeCheckSuccess
    +    }
    +  }
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    --- End diff --
    
    Ideally we need `Since` here. Some functions don't have them because at that time the `Since` method was not there. We should add missing `Since` to them as well, if other people have time to do it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228640680
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.aggregate
    +
    +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types._
    +
    +abstract class UnevaluableBooleanAggBase(arg: Expression)
    --- End diff --
    
    ok, I see you mentioned that there were issues using `RuntimeReplaceble`. Can we have a similar approach to that though? I mean introducing a new method here which returns the replaced value, so that in `ReplaceExpressions` we can simply match on `UnevaluableAggregate` and include the logic of replacement in the single items?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #97986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97986/testReport)** for PR 22809 at commit [`9e194b5`](https://github.com/apache/spark/commit/9e194b57cb30f022392c3fb01959984b9fc17f27).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228142745
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -300,6 +300,10 @@ object FunctionRegistry {
         expression[CollectList]("collect_list"),
         expression[CollectSet]("collect_set"),
         expression[CountMinSketchAgg]("count_min_sketch"),
    +    expression[EveryAgg]("every"),
    +    expression[AnyAgg]("any"),
    +    expression[SomeAgg]("some"),
    +
    --- End diff --
    
    nit: unneeded newline


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228638310
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.aggregate
    +
    +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types._
    +
    +abstract class UnevaluableBooleanAggBase(arg: Expression)
    --- End diff --
    
    @mgaido91 Actually i tried a few different ideas. The are in the comment of original PR [link](https://github.com/apache/spark/pull/22047).  I had prepared two branches with two approaches .. [comment] (https://github.com/apache/spark/pull/22047#discussion_r225700828). Could you please take a look ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97986/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228013503
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
    @@ -727,4 +728,67 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext {
           "grouping expressions: [current_date(None)], value: [key: int, value: string], " +
             "type: GroupBy]"))
       }
    +
    +  def getEveryAggColumn(columnName: String): Column = {
    +    Column(new EveryAgg(Column(columnName).expr).toAggregateExpression(false))
    --- End diff --
    
    Since we don't have APIs for them in `functions`, it's not likely users will use then with DataFrame. Thus I think we don't need these tests.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #98033 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98033/testReport)** for PR 22809 at commit [`08999f9`](https://github.com/apache/spark/commit/08999f98a3af6c7a30c545cec1c3657498fb39c0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait UnevaluableAggregate extends DeclarativeAggregate `
      * `abstract class UnevaluableBooleanAggBase(arg: Expression)`
      * `case class EveryAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg) `
      * `case class AnyAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg) `
      * `case class SomeAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg) `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Can we use these functions in window with this approach?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228143393
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    +  extends UnevaluableAggrgate with ImplicitCastInputTypes {
    +
    +  override def children: Seq[Expression] = arg :: Nil
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    arg.dataType match {
    +      case dt if dt != BooleanType =>
    +        TypeCheckResult.TypeCheckFailure(s"Input to function '$prettyName' should have been " +
    +          s"${BooleanType.simpleString}, but it's [${arg.dataType.catalogString}].")
    +      case _ => TypeCheckResult.TypeCheckSuccess
    +    }
    +  }
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    +case class AnyAgg(arg: Expression) extends AnyAggBase(arg) {
    +  override def nodeName: String = "Any"
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    cc @cloud-fan @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228227624
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    +  extends UnevaluableAggrgate with ImplicitCastInputTypes {
    +
    +  override def children: Seq[Expression] = arg :: Nil
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    arg.dataType match {
    +      case dt if dt != BooleanType =>
    +        TypeCheckResult.TypeCheckFailure(s"Input to function '$prettyName' should have been " +
    +          s"${BooleanType.simpleString}, but it's [${arg.dataType.catalogString}].")
    +      case _ => TypeCheckResult.TypeCheckSuccess
    +    }
    +  }
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    --- End diff --
    
    I will add. One observation : 
    
    I see that, only in the API's we specify the @since .. for example for aggregate Max, we have it in api function.scala:Max ..  These aggregates are not exposed in the dataset apis and none of the other aggregates seem to have it. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228706515
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.aggregate
    +
    +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types._
    +
    +abstract class UnevaluableBooleanAggBase(arg: Expression)
    --- End diff --
    
    yes, but in the analyzer you analyze the children of the current expression right? So we would just have something like `def replacedExpression = Min(arg)`, which means doing exactly the same which is done now, the only difference is where the conversion logic is put. Adn IMHO having all the conversion logic for all the expression in `ReplaceExpressions` is harder to maintain than having all the logic related to the expression contained in it.
    
    Anyway, since you have a different opinion, let's see what @cloud-fan thinks on this. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r232571944
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    +  extends UnevaluableAggrgate with ImplicitCastInputTypes {
    +
    +  override def children: Seq[Expression] = arg :: Nil
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    arg.dataType match {
    +      case dt if dt != BooleanType =>
    +        TypeCheckResult.TypeCheckFailure(s"Input to function '$prettyName' should have been " +
    +          s"${BooleanType.simpleString}, but it's [${arg.dataType.catalogString}].")
    +      case _ => TypeCheckResult.TypeCheckSuccess
    +    }
    +  }
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    --- End diff --
    
    @HyukjinKwon Sure.. I will open a pr shortly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228143428
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Min.scala ---
    @@ -57,3 +57,27 @@ case class Min(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = min
     }
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if all values of `expr` are true.")
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4408/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97999/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #98035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98035/testReport)** for PR 22809 at commit [`2bc9965`](https://github.com/apache/spark/commit/2bc996515ec1947b8a1b82f942bd6ebecc473277).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97930/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228725645
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.aggregate
    +
    +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types._
    +
    +abstract class UnevaluableBooleanAggBase(arg: Expression)
    --- End diff --
    
    @cloud-fan @mgaido91 Thank you. I have added a TODO for now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228634827
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -282,6 +283,31 @@ trait RuntimeReplaceable extends UnaryExpression with Unevaluable {
       override lazy val canonicalized: Expression = child.canonicalized
     }
     
    +/**
    + * An aggregate expression that gets rewritten (currently by the optimizer) into a
    + * different aggregate expression for evaluation. This is mainly used to provide compatibility
    + * with other databases. For example, we use this to support every, any/some aggregates by rewriting
    + * them with Min and Max respectively.
    + */
    +trait UnevaluableAggregate extends DeclarativeAggregate {
    +
    +  override def nullable: Boolean = true
    --- End diff --
    
    @mgaido91 most of the aggregates are nullable, no ? Did you have an suggestion here ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #98139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98139/testReport)** for PR 22809 at commit [`07205de`](https://github.com/apache/spark/commit/07205dea343539cb812622205fd0534b77f183d0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228020366
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
    @@ -727,4 +728,67 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext {
           "grouping expressions: [current_date(None)], value: [key: int, value: string], " +
             "type: GroupBy]"))
       }
    +
    +  def getEveryAggColumn(columnName: String): Column = {
    +    Column(new EveryAgg(Column(columnName).expr).toAggregateExpression(false))
    --- End diff --
    
    @cloud-fan Ok, let me remove these.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228666789
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.aggregate
    +
    +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types._
    +
    +abstract class UnevaluableBooleanAggBase(arg: Expression)
    --- End diff --
    
    @mgaido91 I re-read your comments again and here is what i think you are implying : 
       1. In UnevaluableAggregate, define a method say `replacedExpression`
       2. The sub-class overrides it and return an actual rewritten expression
       3. In optimized, match UnevaluableAggregate and call `replacedExpression`
    
    Did i understand it correctly ? If so, actually, i wouldn't prefer that way. The reason is, the replacedExpression is hidden from analyzer and so it may not be safe. The way ReplaceExpression framework is nicely designed, the analyzer resolves the rewritten expression normally (as its the child expression). Thats the reason, i have opted for specific/targeted rewrites. 
    
    If Wenchen and you think otherwise, then i can change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228143368
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    +  extends UnevaluableAggrgate with ImplicitCastInputTypes {
    +
    +  override def children: Seq[Expression] = arg :: Nil
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    arg.dataType match {
    +      case dt if dt != BooleanType =>
    +        TypeCheckResult.TypeCheckFailure(s"Input to function '$prettyName' should have been " +
    +          s"${BooleanType.simpleString}, but it's [${arg.dataType.catalogString}].")
    +      case _ => TypeCheckResult.TypeCheckSuccess
    +    }
    +  }
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    --- End diff --
    
    add `since = "3.0.0"`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    @cloud-fan Yeah.. I have some tests in group-by.sql. Please take a look.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228636110
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -282,6 +283,31 @@ trait RuntimeReplaceable extends UnaryExpression with Unevaluable {
       override lazy val canonicalized: Expression = child.canonicalized
     }
     
    +/**
    + * An aggregate expression that gets rewritten (currently by the optimizer) into a
    + * different aggregate expression for evaluation. This is mainly used to provide compatibility
    + * with other databases. For example, we use this to support every, any/some aggregates by rewriting
    + * them with Min and Max respectively.
    + */
    +trait UnevaluableAggregate extends DeclarativeAggregate {
    +
    +  override def nullable: Boolean = true
    --- End diff --
    
    @mgaido91 I think for aggregates, its different ? Please `Max`, `Min`, they all define it to be nullable. I think they work on group of rows and can return null on empty input.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Thanks a lot @cloud-fan @mgaido91 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228235105
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    +  extends UnevaluableAggrgate with ImplicitCastInputTypes {
    +
    +  override def children: Seq[Expression] = arg :: Nil
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    arg.dataType match {
    +      case dt if dt != BooleanType =>
    +        TypeCheckResult.TypeCheckFailure(s"Input to function '$prettyName' should have been " +
    +          s"${BooleanType.simpleString}, but it's [${arg.dataType.catalogString}].")
    +      case _ => TypeCheckResult.TypeCheckSuccess
    +    }
    +  }
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    --- End diff --
    
    @mgaido91 Yeah... If we look at the definition of other aggregate function like Max, Min etc, they don't seem to have the `@Since`.  However, they are defined in the for `functions.scala` for max and min as `@Since 1.3`.  So basically i was not sure on the what is the rule when a function is not exposed in dataset api but only from SQL.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228239274
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    +  extends UnevaluableAggrgate with ImplicitCastInputTypes {
    +
    +  override def children: Seq[Expression] = arg :: Nil
    +
    +  override def dataType: DataType = BooleanType
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType)
    +
    +  override def checkInputDataTypes(): TypeCheckResult = {
    +    arg.dataType match {
    +      case dt if dt != BooleanType =>
    +        TypeCheckResult.TypeCheckFailure(s"Input to function '$prettyName' should have been " +
    +          s"${BooleanType.simpleString}, but it's [${arg.dataType.catalogString}].")
    +      case _ => TypeCheckResult.TypeCheckSuccess
    +    }
    +  }
    +}
    +
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.")
    --- End diff --
    
    @mgaido91 @cloud-fan Thanks .. will add.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #98139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98139/testReport)** for PR 22809 at commit [`07205de`](https://github.com/apache/spark/commit/07205dea343539cb812622205fd0534b77f183d0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #97930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97930/testReport)** for PR 22809 at commit [`b793d06`](https://github.com/apache/spark/commit/b793d06cb937db300c78e4eb4cd143c385419e57).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228631452
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -282,6 +283,31 @@ trait RuntimeReplaceable extends UnaryExpression with Unevaluable {
       override lazy val canonicalized: Expression = child.canonicalized
     }
     
    +/**
    + * An aggregate expression that gets rewritten (currently by the optimizer) into a
    + * different aggregate expression for evaluation. This is mainly used to provide compatibility
    + * with other databases. For example, we use this to support every, any/some aggregates by rewriting
    + * them with Min and Max respectively.
    + */
    +trait UnevaluableAggregate extends DeclarativeAggregate {
    +
    +  override def nullable: Boolean = true
    --- End diff --
    
    why do we set them always as nullable?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98033/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22809


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228229829
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    --- End diff --
    
    @mgaido91 I had a confusion on where to house this class ? Is it okay if i just rename it and keep it in Max.scala ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #97999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97999/testReport)** for PR 22809 at commit [`6abf844`](https://github.com/apache/spark/commit/6abf844f728471b737a047a81c3f287060506473).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228144076
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala ---
    @@ -21,24 +21,32 @@ import scala.collection.mutable
     
     import org.apache.spark.sql.catalyst.catalog.SessionCatalog
     import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.expressions.aggregate._
     import org.apache.spark.sql.catalyst.plans.logical._
     import org.apache.spark.sql.catalyst.rules._
     import org.apache.spark.sql.catalyst.util.DateTimeUtils
     import org.apache.spark.sql.types._
     
     
     /**
    - * Finds all [[RuntimeReplaceable]] expressions and replace them with the expressions that can
    - * be evaluated. This is mainly used to provide compatibility with other databases.
    - * For example, we use this to support "nvl" by replacing it with "coalesce".
    + * Finds all the expressions that are unevaluable and replace/rewrite them with semantically
    + * equivalent expressions that can be evaluated. Currently we replace two kinds of expressions :
    --- End diff --
    
    nit: extra space before `:`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228238508
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala ---
    @@ -57,3 +57,34 @@ case class Max(child: Expression) extends DeclarativeAggregate {
     
       override lazy val evaluateExpression: AttributeReference = max
     }
    +
    +abstract class AnyAggBase(arg: Expression)
    --- End diff --
    
    If it's hard to decide where to put it, I think putting it in a new file can be considered.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228634159
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.aggregate
    +
    +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.types._
    +
    +abstract class UnevaluableBooleanAggBase(arg: Expression)
    --- End diff --
    
    @mgaido91 RuntimeReplaceble works for scalar expressions but not for aggregate expressions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    **[Test build #97930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97930/testReport)** for PR 22809 at commit [`b793d06`](https://github.com/apache/spark/commit/b793d06cb937db300c78e4eb4cd143c385419e57).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait UnevaluableAggrgate extends DeclarativeAggregate `
      * `abstract class AnyAggBase(arg: Expression)`
      * `case class AnyAgg(arg: Expression) extends AnyAggBase(arg) `
      * `case class SomeAgg(arg: Expression) extends AnyAggBase(arg) `
      * `case class EveryAgg(arg: Expression)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22809
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22809#discussion_r228637047
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -282,6 +283,31 @@ trait RuntimeReplaceable extends UnaryExpression with Unevaluable {
       override lazy val canonicalized: Expression = child.canonicalized
     }
     
    +/**
    + * An aggregate expression that gets rewritten (currently by the optimizer) into a
    + * different aggregate expression for evaluation. This is mainly used to provide compatibility
    + * with other databases. For example, we use this to support every, any/some aggregates by rewriting
    + * them with Min and Max respectively.
    + */
    +trait UnevaluableAggregate extends DeclarativeAggregate {
    +
    +  override def nullable: Boolean = true
    --- End diff --
    
    Ah, right, I was missing that case, sorry, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org