You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2018/04/10 16:56:52 UTC

[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/21031

    [SPARK-23923][SQL] Add cardinality function

    ## What changes were proposed in this pull request?
    
    The PR adds the SQL function `cardinality`. The behavior of the function is based on Presto's one.
    
    The function returns the length of the array or map stored in the column as `BigInt`.
    
    ## How was this patch tested?
    
    Added UTs

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-23923

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21031.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21031
    
----
commit 89458544b564097f2fa6d34ae654c86f155410df
Author: Kazuaki Ishizaki <is...@...>
Date:   2018-04-10T16:52:36Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180992933
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -24,6 +25,47 @@ import org.apache.spark.sql.catalyst.expressions.codegen._
     import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, MapData}
     import org.apache.spark.sql.types._
     
    +/**
    + * Common base class for [[Size]] and [[Cardinality]].
    + */
    +abstract class SizeUtil extends UnaryExpression with ExpectsInputTypes {
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    +  override def nullable: Boolean = false
    +
    +  def sizeEval(child: Expression, input: InternalRow, resultTypeBigInt: Boolean): Any = {
    +    val value = child.eval(input)
    +    val result = if (value == null) {
    +      -1
    +    } else child.dataType match {
    +      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
    +      case _: MapType => value.asInstanceOf[MapData].numElements()
    +    }
    +    if (resultTypeBigInt) {
    +      new Decimal().setOrNull(result.asInstanceOf[Int].toLong, DecimalType.MAX_PRECISION, 0)
    +    } else {
    +      result
    +    }
    +  }
    +
    +  def doSizeGenCode(ctx: CodegenContext, ev: ExprCode, resultTypeBigInt: Boolean): ExprCode = {
    --- End diff --
    
    in this way you can directly write the `eval` and `doGenCode` methods and in the `Size` and `Cardinality` classes we just need to override `def resultTypeBigInt` setting it to `true` or `false`. I think it is cleaner, but it is not a big deal.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89990/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #90017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90017/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180763490
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -59,6 +60,36 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType
       }
     }
     
    +@ExpressionDescription(
    --- End diff --
    
    good catch, thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89669/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90068/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #90026 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90026/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180807089
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -24,6 +25,47 @@ import org.apache.spark.sql.catalyst.expressions.codegen._
     import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, MapData}
     import org.apache.spark.sql.types._
     
    +/**
    + * Common base class for [[Size]] and [[Cardinality]].
    + */
    +abstract class SizeUtil extends UnaryExpression with ExpectsInputTypes {
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    +  override def nullable: Boolean = false
    +
    +  def sizeEval(child: Expression, input: InternalRow, resultTypeBigInt: Boolean): Any = {
    +    val value = child.eval(input)
    +    val result = if (value == null) {
    +      -1
    +    } else child.dataType match {
    +      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
    +      case _: MapType => value.asInstanceOf[MapData].numElements()
    +    }
    +    if (resultTypeBigInt) {
    +      new Decimal().setOrNull(result.asInstanceOf[Int].toLong, DecimalType.MAX_PRECISION, 0)
    +    } else {
    +      result
    +    }
    +  }
    +
    +  def doSizeGenCode(ctx: CodegenContext, ev: ExprCode, resultTypeBigInt: Boolean): ExprCode = {
    --- End diff --
    
    Would it be possible to elaborate on the advantage of adding `def resultTypeBigInt` instead of passing as an argument?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #90068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90068/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r182816541
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -407,6 +407,7 @@ object FunctionRegistry {
         expression[MapKeys]("map_keys"),
         expression[MapValues]("map_values"),
         expression[Size]("size"),
    +    expression[Cardinality]("cardinality"),
    --- End diff --
    
    Alias names are implemented like that. No need to create a new one. We did it in the other SQL functions too. like char,  char_length and many. We can update the description in the existing ExpressionDescription 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89646/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89189 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89189/testReport)** for PR 21031 at commit [`d405d8a`](https://github.com/apache/spark/commit/d405d8a37c6e46340f49ff449b8a5aeca80de751).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89269 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89269/testReport)** for PR 21031 at commit [`11fe6f7`](https://github.com/apache/spark/commit/11fe6f7bad1690f181f620516274ebd070712b3c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89206/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180993164
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -34,28 +76,38 @@ import org.apache.spark.sql.types._
           > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
            4
       """)
    -case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +case class Size(child: Expression) extends SizeUtil {
       override def dataType: DataType = IntegerType
    -  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    -  override def nullable: Boolean = false
     
       override def eval(input: InternalRow): Any = {
    -    val value = child.eval(input)
    -    if (value == null) {
    -      -1
    -    } else child.dataType match {
    -      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
    -      case _: MapType => value.asInstanceOf[MapData].numElements()
    -    }
    +    sizeEval(child, input, false)
       }
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    val childGen = child.genCode(ctx)
    -    ev.copy(code = s"""
    -      boolean ${ev.isNull} = false;
    -      ${childGen.code}
    -      ${CodeGenerator.javaType(dataType)} ${ev.value} = ${childGen.isNull} ? -1 :
    -        (${childGen.value}).numElements();""", isNull = FalseLiteral)
    +    doSizeGenCode(ctx, ev, false)
    +  }
    +}
    +
    +/**
    + * Given an array or map, returns its size as BigInt. Returns -1 if null.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns the size of an array or a map as BigInt. Returns -1 if null.",
    --- End diff --
    
    I will update https://github.com/apache/spark/pull/21037, too



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89280/testReport)** for PR 21031 at commit [`11fe6f7`](https://github.com/apache/spark/commit/11fe6f7bad1690f181f620516274ebd070712b3c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90046/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2188/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2176/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2256/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89669/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r183110758
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -407,6 +407,7 @@ object FunctionRegistry {
         expression[MapKeys]("map_keys"),
         expression[MapValues]("map_values"),
         expression[Size]("size"),
    +    expression[Cardinality]("cardinality"),
    --- End diff --
    
    Thank you for your clarification. I see.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89141/testReport)** for PR 21031 at commit [`8945854`](https://github.com/apache/spark/commit/89458544b564097f2fa6d34ae654c86f155410df).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89189/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2311/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r182622583
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -407,6 +407,7 @@ object FunctionRegistry {
         expression[MapKeys]("map_keys"),
         expression[MapValues]("map_values"),
         expression[Size]("size"),
    +    expression[Cardinality]("cardinality"),
    --- End diff --
    
    I see. I will update this PR with only `expression[Cardinality]("cardinality")`. I will also update the description and JIRA later.
    
    Is it OK with you? @gatorsmile cc: @ueshin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180762559
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -59,6 +60,36 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType
       }
     }
     
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns the size of an array or a map as BigInt. Returns -1 if null.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
    +       4
    +  """,
    +  since = "2.4.0")
    +case class Cardinality(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +  override def dataType: DataType = DecimalType.BigIntDecimal
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    +  override def nullable: Boolean = false
    +
    +  val size = Size(child)
    --- End diff --
    
    Is it better way to extend a case class?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2221/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89258/testReport)** for PR 21031 at commit [`11fe6f7`](https://github.com/apache/spark/commit/11fe6f7bad1690f181f620516274ebd070712b3c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89280/testReport)** for PR 21031 at commit [`11fe6f7`](https://github.com/apache/spark/commit/11fe6f7bad1690f181f620516274ebd070712b3c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r182627739
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -407,6 +407,7 @@ object FunctionRegistry {
         expression[MapKeys]("map_keys"),
         expression[MapValues]("map_values"),
         expression[Size]("size"),
    +    expression[Cardinality]("cardinality"),
    --- End diff --
    
    One question: Do we need `@ExpressionDescription`? If we need this, we may need to create `Cardinality` case class that is the same as `Size`. Of course, `Cardinality` and `Size` can extend the same parent class.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2540/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2267/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    According to my understanding, these activities are to improve compatibility with other DBs (like Presto) in https://issues.apache.org/jira/browse/SPARK-23899 and https://issues.apache.org/jira/browse/SPARK-23923.
    
    As you pointed out, `cardinality` and `size` has the same except data type. I used the same implementation.
    
    @gatorsmile what do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180754264
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -59,6 +60,36 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType
       }
     }
     
    +@ExpressionDescription(
    --- End diff --
    
    missing scala doc


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89150/testReport)** for PR 21031 at commit [`d405d8a`](https://github.com/apache/spark/commit/d405d8a37c6e46340f49ff449b8a5aeca80de751).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180929145
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -34,28 +76,38 @@ import org.apache.spark.sql.types._
           > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
            4
       """)
    -case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +case class Size(child: Expression) extends SizeUtil {
       override def dataType: DataType = IntegerType
    -  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    -  override def nullable: Boolean = false
     
       override def eval(input: InternalRow): Any = {
    -    val value = child.eval(input)
    -    if (value == null) {
    -      -1
    -    } else child.dataType match {
    -      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
    -      case _: MapType => value.asInstanceOf[MapData].numElements()
    -    }
    +    sizeEval(child, input, false)
       }
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    val childGen = child.genCode(ctx)
    -    ev.copy(code = s"""
    -      boolean ${ev.isNull} = false;
    -      ${childGen.code}
    -      ${CodeGenerator.javaType(dataType)} ${ev.value} = ${childGen.isNull} ? -1 :
    -        (${childGen.value}).numElements();""", isNull = FalseLiteral)
    +    doSizeGenCode(ctx, ev, false)
    +  }
    +}
    +
    +/**
    + * Given an array or map, returns its size as BigInt. Returns -1 if null.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns the size of an array or a map as BigInt. Returns -1 if null.",
    --- End diff --
    
    BigInt is Presto's data type name, I think we should use SparkSQL's data type here. Btw, I think we should use LongType instead of DecimalType.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #90046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90046/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89258/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89254 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89254/testReport)** for PR 21031 at commit [`b4f9839`](https://github.com/apache/spark/commit/b4f98396784db88dae2139c74c267526dc72d82b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    LGTM pending Jenkins


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2168/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r181034039
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -24,6 +25,47 @@ import org.apache.spark.sql.catalyst.expressions.codegen._
     import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, MapData}
     import org.apache.spark.sql.types._
     
    +/**
    + * Common base class for [[Size]] and [[Cardinality]].
    + */
    +abstract class SizeUtil extends UnaryExpression with ExpectsInputTypes {
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    +  override def nullable: Boolean = false
    +
    +  def sizeEval(child: Expression, input: InternalRow, resultTypeBigInt: Boolean): Any = {
    +    val value = child.eval(input)
    +    val result = if (value == null) {
    +      -1
    +    } else child.dataType match {
    +      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
    +      case _: MapType => value.asInstanceOf[MapData].numElements()
    +    }
    +    if (resultTypeBigInt) {
    +      new Decimal().setOrNull(result.asInstanceOf[Int].toLong, DecimalType.MAX_PRECISION, 0)
    +    } else {
    +      result
    +    }
    +  }
    +
    +  def doSizeGenCode(ctx: CodegenContext, ev: ExprCode, resultTypeBigInt: Boolean): ExprCode = {
    --- End diff --
    
    even better!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89206/testReport)** for PR 21031 at commit [`67f8716`](https://github.com/apache/spark/commit/67f87169a1f11d80aecbd6c035a1bc707c37c400).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class SizeUtil extends UnaryExpression with ExpectsInputTypes `
      * `case class Size(child: Expression) extends SizeUtil `
      * `case class Cardinality(child: Expression) extends SizeUtil `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180992908
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -34,28 +76,38 @@ import org.apache.spark.sql.types._
           > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
            4
       """)
    -case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +case class Size(child: Expression) extends SizeUtil {
       override def dataType: DataType = IntegerType
    -  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    -  override def nullable: Boolean = false
     
       override def eval(input: InternalRow): Any = {
    -    val value = child.eval(input)
    -    if (value == null) {
    -      -1
    -    } else child.dataType match {
    -      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
    -      case _: MapType => value.asInstanceOf[MapData].numElements()
    -    }
    +    sizeEval(child, input, false)
       }
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    val childGen = child.genCode(ctx)
    -    ev.copy(code = s"""
    -      boolean ${ev.isNull} = false;
    -      ${childGen.code}
    -      ${CodeGenerator.javaType(dataType)} ${ev.value} = ${childGen.isNull} ? -1 :
    -        (${childGen.value}).numElements();""", isNull = FalseLiteral)
    +    doSizeGenCode(ctx, ev, false)
    +  }
    +}
    +
    +/**
    + * Given an array or map, returns its size as BigInt. Returns -1 if null.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns the size of an array or a map as BigInt. Returns -1 if null.",
    --- End diff --
    
    Good catch, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180766173
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -59,6 +60,36 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType
       }
     }
     
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns the size of an array or a map as BigInt. Returns -1 if null.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
    +       4
    +  """,
    +  since = "2.4.0")
    +case class Cardinality(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +  override def dataType: DataType = DecimalType.BigIntDecimal
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    +  override def nullable: Boolean = false
    +
    +  val size = Size(child)
    --- End diff --
    
    oh, I see... maybe a common abstract class as you did in another case?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    The SparkR test failure is not related. Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89646/testReport)** for PR 21031 at commit [`c6e8d81`](https://github.com/apache/spark/commit/c6e8d81aed5da53569242a648652d2a46eddecc3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89990/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r181344225
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3282,6 +3282,14 @@ object functions {
        */
       def size(e: Column): Column = withExpr { Size(e.expr) }
     
    +  /**
    +   * Returns length of array or map as BigInt.
    --- End diff --
    
    Good catch, thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2253/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r181034805
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -34,28 +66,38 @@ import org.apache.spark.sql.types._
           > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
            4
       """)
    -case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +case class Size(child: Expression) extends SizeUtil {
       override def dataType: DataType = IntegerType
    -  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    -  override def nullable: Boolean = false
     
       override def eval(input: InternalRow): Any = {
    -    val value = child.eval(input)
    -    if (value == null) {
    -      -1
    -    } else child.dataType match {
    -      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
    -      case _: MapType => value.asInstanceOf[MapData].numElements()
    -    }
    +    sizeEval(child, input)
       }
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    val childGen = child.genCode(ctx)
    -    ev.copy(code = s"""
    -      boolean ${ev.isNull} = false;
    -      ${childGen.code}
    -      ${CodeGenerator.javaType(dataType)} ${ev.value} = ${childGen.isNull} ? -1 :
    -        (${childGen.value}).numElements();""", isNull = FalseLiteral)
    +    doSizeGenCode(ctx, ev)
    +  }
    +}
    +
    +/**
    + * Given an array or map, returns its size as BigInt. Returns -1 if null.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns the size of an array or a map as long. Returns -1 if null.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
    +       4
    +  """,
    +  since = "2.4.0")
    +case class Cardinality(child: Expression) extends SizeUtil {
    +  override def dataType: DataType = LongType
    +
    +  override def eval(input: InternalRow): Any = {
    --- End diff --
    
    it is quite pointless to have this and `def doGenCode` here and in `Size`. Can't we just implement `eval` and `doGenCode` in `SizeUtil`? ie. rename there `doSizeGenCode` -> `doGenCode` and `sizeEval` -> `eval`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180753851
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -59,6 +60,36 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType
       }
     }
     
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns the size of an array or a map as BigInt. Returns -1 if null.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
    +       4
    +  """,
    +  since = "2.4.0")
    +case class Cardinality(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +  override def dataType: DataType = DecimalType.BigIntDecimal
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    +  override def nullable: Boolean = false
    +
    +  val size = Size(child)
    --- End diff --
    
    can't we extend `Size` instead?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89169 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89169/testReport)** for PR 21031 at commit [`d405d8a`](https://github.com/apache/spark/commit/d405d8a37c6e46340f49ff449b8a5aeca80de751).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89141/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r180804842
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -24,6 +25,47 @@ import org.apache.spark.sql.catalyst.expressions.codegen._
     import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, MapData}
     import org.apache.spark.sql.types._
     
    +/**
    + * Common base class for [[Size]] and [[Cardinality]].
    + */
    +abstract class SizeUtil extends UnaryExpression with ExpectsInputTypes {
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    +  override def nullable: Boolean = false
    +
    +  def sizeEval(child: Expression, input: InternalRow, resultTypeBigInt: Boolean): Any = {
    +    val value = child.eval(input)
    +    val result = if (value == null) {
    +      -1
    +    } else child.dataType match {
    +      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
    +      case _: MapType => value.asInstanceOf[MapData].numElements()
    +    }
    +    if (resultTypeBigInt) {
    +      new Decimal().setOrNull(result.asInstanceOf[Int].toLong, DecimalType.MAX_PRECISION, 0)
    +    } else {
    +      result
    +    }
    +  }
    +
    +  def doSizeGenCode(ctx: CodegenContext, ev: ExprCode, resultTypeBigInt: Boolean): ExprCode = {
    --- End diff --
    
    minor: what about adding a `def resultTypeBigInt` (or something similar) which is overridden by the subclasses instead of having it as a argument?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89141/testReport)** for PR 21031 at commit [`8945854`](https://github.com/apache/spark/commit/89458544b564097f2fa6d34ae654c86f155410df).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Cardinality(child: Expression) extends UnaryExpression with ExpectsInputTypes `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89269/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89990/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2789/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21031


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #90026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90026/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2823/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2276/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    If there is already size, why do we need to create a new implementation? Why can't we just rewrite cardinality to size? 
    
    Also I wouldn't add any programming API for this, since there is already size.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89332/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89332/testReport)** for PR 21031 at commit [`a21f85b`](https://github.com/apache/spark/commit/a21f85ba2bd4c2f3ee33ac0499a4f92fe2e54629).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r182559515
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -407,6 +407,7 @@ object FunctionRegistry {
         expression[MapKeys]("map_keys"),
         expression[MapValues]("map_values"),
         expression[Size]("size"),
    +    expression[Cardinality]("cardinality"),
    --- End diff --
    
    That might be fine in most cases. If we really need it In the future, we can add new APIs with better names later?
    
    Like in SQL, we have COUNT and COUNT_BIG. COUNT_BIG is similar to COUNT but the result can be grater than the max value of integer. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89206/testReport)** for PR 21031 at commit [`67f8716`](https://github.com/apache/spark/commit/67f87169a1f11d80aecbd6c035a1bc707c37c400).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89189/testReport)** for PR 21031 at commit [`d405d8a`](https://github.com/apache/spark/commit/d405d8a37c6e46340f49ff449b8a5aeca80de751).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2806/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89169 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89169/testReport)** for PR 21031 at commit [`d405d8a`](https://github.com/apache/spark/commit/d405d8a37c6e46340f49ff449b8a5aeca80de751).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r182526092
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -407,6 +407,7 @@ object FunctionRegistry {
         expression[MapKeys]("map_keys"),
         expression[MapValues]("map_values"),
         expression[Size]("size"),
    +    expression[Cardinality]("cardinality"),
    --- End diff --
    
    ping @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2204/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89269 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89269/testReport)** for PR 21031 at commit [`11fe6f7`](https://github.com/apache/spark/commit/11fe6f7bad1690f181f620516274ebd070712b3c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r182622638
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3282,6 +3282,14 @@ object functions {
        */
       def size(e: Column): Column = withExpr { Size(e.expr) }
     
    +  /**
    +   * Returns length of array or map as long.
    +   *
    +   * @group collection_funcs
    +   * @since 2.4.0
    +   */
    +  def cardinality(e: Column): Column = withExpr { Cardinality(e.expr) }
    --- End diff --
    
    I see.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89258/testReport)** for PR 21031 at commit [`11fe6f7`](https://github.com/apache/spark/commit/11fe6f7bad1690f181f620516274ebd070712b3c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r181340756
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3282,6 +3282,14 @@ object functions {
        */
       def size(e: Column): Column = withExpr { Size(e.expr) }
     
    +  /**
    +   * Returns length of array or map as BigInt.
    --- End diff --
    
    BigInt -> long


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89646 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89646/testReport)** for PR 21031 at commit [`c6e8d81`](https://github.com/apache/spark/commit/c6e8d81aed5da53569242a648652d2a46eddecc3).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89669/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #90046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90046/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90017/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2760/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89254/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2556/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2781/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r181028951
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -24,6 +25,47 @@ import org.apache.spark.sql.catalyst.expressions.codegen._
     import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, MapData}
     import org.apache.spark.sql.types._
     
    +/**
    + * Common base class for [[Size]] and [[Cardinality]].
    + */
    +abstract class SizeUtil extends UnaryExpression with ExpectsInputTypes {
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    +  override def nullable: Boolean = false
    +
    +  def sizeEval(child: Expression, input: InternalRow, resultTypeBigInt: Boolean): Any = {
    +    val value = child.eval(input)
    +    val result = if (value == null) {
    +      -1
    +    } else child.dataType match {
    +      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
    +      case _: MapType => value.asInstanceOf[MapData].numElements()
    +    }
    +    if (resultTypeBigInt) {
    +      new Decimal().setOrNull(result.asInstanceOf[Int].toLong, DecimalType.MAX_PRECISION, 0)
    +    } else {
    +      result
    +    }
    +  }
    +
    +  def doSizeGenCode(ctx: CodegenContext, ev: ExprCode, resultTypeBigInt: Boolean): ExprCode = {
    --- End diff --
    
    Now, I realized `dataType` can be used for this purpose. Thank you for your comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #90017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90017/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89150/testReport)** for PR 21031 at commit [`d405d8a`](https://github.com/apache/spark/commit/d405d8a37c6e46340f49ff449b8a5aeca80de751).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89332 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89332/testReport)** for PR 21031 at commit [`a21f85b`](https://github.com/apache/spark/commit/a21f85ba2bd4c2f3ee33ac0499a4f92fe2e54629).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #89254 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89254/testReport)** for PR 21031 at commit [`b4f9839`](https://github.com/apache/spark/commit/b4f98396784db88dae2139c74c267526dc72d82b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89150/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89280/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r181622107
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -407,6 +407,7 @@ object FunctionRegistry {
         expression[MapKeys]("map_keys"),
         expression[MapValues]("map_values"),
         expression[Size]("size"),
    +    expression[Cardinality]("cardinality"),
    --- End diff --
    
    `expression[Size]("cardinality"),` Does this work?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r181038144
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
    @@ -34,28 +66,38 @@ import org.apache.spark.sql.types._
           > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
            4
       """)
    -case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes {
    +case class Size(child: Expression) extends SizeUtil {
       override def dataType: DataType = IntegerType
    -  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType))
    -  override def nullable: Boolean = false
     
       override def eval(input: InternalRow): Any = {
    -    val value = child.eval(input)
    -    if (value == null) {
    -      -1
    -    } else child.dataType match {
    -      case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
    -      case _: MapType => value.asInstanceOf[MapData].numElements()
    -    }
    +    sizeEval(child, input)
       }
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    val childGen = child.genCode(ctx)
    -    ev.copy(code = s"""
    -      boolean ${ev.isNull} = false;
    -      ${childGen.code}
    -      ${CodeGenerator.javaType(dataType)} ${ev.value} = ${childGen.isNull} ? -1 :
    -        (${childGen.value}).numElements();""", isNull = FalseLiteral)
    +    doSizeGenCode(ctx, ev)
    +  }
    +}
    +
    +/**
    + * Given an array or map, returns its size as BigInt. Returns -1 if null.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr) - Returns the size of an array or a map as long. Returns -1 if null.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
    +       4
    +  """,
    +  since = "2.4.0")
    +case class Cardinality(child: Expression) extends SizeUtil {
    +  override def dataType: DataType = LongType
    +
    +  override def eval(input: InternalRow): Any = {
    --- End diff --
    
    Good catch, thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r183166915
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2124,6 +2124,21 @@ def size(col):
         return Column(sc._jvm.functions.size(_to_java_column(col)))
     
     
    +@since(2.4)
    +def cardinality(col):
    --- End diff --
    
    Could you also remove this? I think we just need to do it for SQL functions now. In the future, if this is requested by the community, we can reconsider it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Sure, done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    @kiszk Could you also update the PR description? LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r183196830
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2124,6 +2124,21 @@ def size(col):
         return Column(sc._jvm.functions.size(_to_java_column(col)))
     
     
    +@since(2.4)
    +def cardinality(col):
    --- End diff --
    
    I see.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    **[Test build #90068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90068/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89169/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90026/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r181621963
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3282,6 +3282,14 @@ object functions {
        */
       def size(e: Column): Column = withExpr { Size(e.expr) }
     
    +  /**
    +   * Returns length of array or map as long.
    +   *
    +   * @group collection_funcs
    +   * @since 2.4.0
    +   */
    +  def cardinality(e: Column): Column = withExpr { Cardinality(e.expr) }
    --- End diff --
    
    Do not add the function APIs. Adding the SQL function is enough.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21031#discussion_r181652753
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -407,6 +407,7 @@ object FunctionRegistry {
         expression[MapKeys]("map_keys"),
         expression[MapValues]("map_values"),
         expression[Size]("size"),
    +    expression[Cardinality]("cardinality"),
    --- End diff --
    
    In Presto, `cardinality`'s return type is `BigInt`. Thus, `cardinality` in Spark uses uses `long` as return type. If we use `int` as `cardinality`'s return type, I think that it works.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21031
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org