You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by li...@apache.org on 2018/05/02 20:53:13 UTC
spark git commit: [SPARK-23923][SQL] Add cardinality function
Repository: spark
Updated Branches:
refs/heads/master 504c9cfd2 -> 5be8aab14
[SPARK-23923][SQL] Add cardinality function
## What changes were proposed in this pull request?
The PR adds the SQL function `cardinality`. The behavior of the function is based on Presto's one.
The function returns the length of the array or map stored in the column as `int` while the Presto version returns the value as `BigInt` (`long` in Spark). The discussions regarding the difference of return type are [here](https://github.com/apache/spark/pull/21031#issuecomment-381284638) and [there](https://github.com/apache/spark/pull/21031#discussion_r181622107).
## How was this patch tested?
Added UTs
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Closes #21031 from kiszk/SPARK-23923.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5be8aab1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5be8aab1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5be8aab1
Branch: refs/heads/master
Commit: 5be8aab14468e55b1049a0c83f02dcec0651162f
Parents: 504c9cf
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Authored: Wed May 2 13:53:10 2018 -0700
Committer: gatorsmile <ga...@gmail.com>
Committed: Wed May 2 13:53:10 2018 -0700
----------------------------------------------------------------------
.../apache/spark/sql/catalyst/analysis/FunctionRegistry.scala | 1 +
.../scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala | 5 +++++
2 files changed, 6 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/5be8aab1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
----------------------------------------------------------------------
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 6bc7b4e..3ffbc9c 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -409,6 +409,7 @@ object FunctionRegistry {
expression[MapKeys]("map_keys"),
expression[MapValues]("map_values"),
expression[Size]("size"),
+ expression[Size]("cardinality"),
expression[SortArray]("sort_array"),
expression[ArrayMin]("array_min"),
expression[ArrayMax]("array_max"),
http://git-wip-us.apache.org/repos/asf/spark/blob/5be8aab1/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
index 470a1c8..a5163ac 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
@@ -341,6 +341,11 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext {
df.selectExpr("size(a)"),
Seq(Row(2), Row(0), Row(3), Row(-1))
)
+
+ checkAnswer(
+ df.selectExpr("cardinality(a)"),
+ Seq(Row(2L), Row(0L), Row(3L), Row(-1L))
+ )
}
test("map size function") {
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org