You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/28 03:04:36 UTC

[GitHub] [spark] xinrong-databricks opened a new pull request #35671: Introduce SQL function ARRAY_SIZE

xinrong-databricks opened a new pull request #35671:
URL: https://github.com/apache/spark/pull/35671


   ### What changes were proposed in this pull request?
   Introduce SQL function ARRAY_SIZE.
   
   ### Why are the changes needed?
   Counting elements within an array is a common use case. Other DBRMS like Snowflake supports that as well:
   https://docs.snowflake.com/en/sql-reference/functions/array_size.html
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yea. `array_size` is available now.
   
   
   ### How was this patch tested?
   Unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #35671:
URL: https://github.com/apache/spark/pull/35671


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #35671:
URL: https://github.com/apache/spark/pull/35671#discussion_r817391074



##########
File path: sql/core/src/test/resources/sql-tests/inputs/array.sql
##########
@@ -106,3 +106,10 @@ select elt(2, '123', null);
 
 select array(1, 2, 3)[5];
 select array(1, 2, 3)[-1];
+
+-- array_size
+select array_size(array()) = size(array());

Review comment:
       Sounds good!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35671:
URL: https://github.com/apache/spark/pull/35671#discussion_r817402644



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -133,6 +133,34 @@ object Size {
   def apply(child: Expression): Size = new Size(child)
 }
 
+
+/**
+ * Given an array, returns total number of elements in it.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(expr) - Returns the size of an array. The function returns null for null input.

Review comment:
       nit: looking at other functions, I think we should write `usage = "...."` instead of multiline string here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #35671:
URL: https://github.com/apache/spark/pull/35671#issuecomment-1057608199


   Pardon me for one last rebase in order to update `sql-expression-schema.md`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #35671:
URL: https://github.com/apache/spark/pull/35671#discussion_r817391736



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -133,6 +133,53 @@ object Size {
   def apply(child: Expression): Size = new Size(child)
 }
 
+
+/**
+ * Given an array, returns total number of elements in it.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(expr) - Returns the size of an array.
+    The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or
+    spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input.
+    With the default settings, the function returns -1 for null input.
+  """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
+       4
+  """,
+  since = "3.3.0",
+  group = "collection_funcs")
+case class ArraySize(expr: Expression, legacySizeOfNull: Option[Boolean]) extends RuntimeReplaceable

Review comment:
       Good point! Legacy behavior has been removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35671:
URL: https://github.com/apache/spark/pull/35671#discussion_r816562959



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -133,6 +133,53 @@ object Size {
   def apply(child: Expression): Size = new Size(child)
 }
 
+
+/**
+ * Given an array, returns total number of elements in it.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(expr) - Returns the size of an array.
+    The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or
+    spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input.
+    With the default settings, the function returns -1 for null input.
+  """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
+       4
+  """,
+  since = "3.3.0",
+  group = "collection_funcs")
+case class ArraySize(expr: Expression, legacySizeOfNull: Option[Boolean]) extends RuntimeReplaceable

Review comment:
       it's a new function, do we need to keep the legacy behavior?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35671:
URL: https://github.com/apache/spark/pull/35671#discussion_r817402188



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -133,6 +133,34 @@ object Size {
   def apply(child: Expression): Size = new Size(child)
 }
 
+
+/**
+ * Given an array, returns total number of elements in it.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(expr) - Returns the size of an array. The function returns null for null input.
+  """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
+       4
+  """,
+  since = "3.3.0",
+  group = "collection_funcs")
+case class ArraySize(expr: Expression) extends RuntimeReplaceable with ImplicitCastInputTypes {

Review comment:
       nit: `with UnaryLike[Expression]` so that we don't need to implement `def children`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35671:
URL: https://github.com/apache/spark/pull/35671#issuecomment-1059142758


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35671:
URL: https://github.com/apache/spark/pull/35671#discussion_r816563797



##########
File path: sql/core/src/test/resources/sql-tests/inputs/array.sql
##########
@@ -106,3 +106,10 @@ select elt(2, '123', null);
 
 select array(1, 2, 3)[5];
 select array(1, 2, 3)[-1];
+
+-- array_size
+select array_size(array()) = size(array());

Review comment:
       let's test `array_size` only instead of comparing with `size`. We should just test `select array_size(array());`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #35671:
URL: https://github.com/apache/spark/pull/35671#issuecomment-1055083507


   May I ask for a review when you are free? Thanks! @cloud-fan @gengliangwang @MaxGekk @entong


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #35671:
URL: https://github.com/apache/spark/pull/35671#discussion_r818262652



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -133,6 +133,34 @@ object Size {
   def apply(child: Expression): Size = new Size(child)
 }
 
+
+/**
+ * Given an array, returns total number of elements in it.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(expr) - Returns the size of an array. The function returns null for null input.

Review comment:
       Good catch! Modified.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -133,6 +133,34 @@ object Size {
   def apply(child: Expression): Size = new Size(child)
 }
 
+
+/**
+ * Given an array, returns total number of elements in it.
+ */
+@ExpressionDescription(
+  usage = """
+    _FUNC_(expr) - Returns the size of an array. The function returns null for null input.
+  """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
+       4
+  """,
+  since = "3.3.0",
+  group = "collection_funcs")
+case class ArraySize(expr: Expression) extends RuntimeReplaceable with ImplicitCastInputTypes {

Review comment:
       Makes sense, updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #35671:
URL: https://github.com/apache/spark/pull/35671#discussion_r819266462



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -133,6 +133,33 @@ object Size {
   def apply(child: Expression): Size = new Size(child)
 }
 
+
+/**
+ * Given an array, returns total number of elements in it.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the size of an array. The function returns null for null input.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
+       4
+  """,
+  since = "3.3.0",
+  group = "collection_funcs")
+case class ArraySize(expr: Expression)

Review comment:
       Good idea!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35671:
URL: https://github.com/apache/spark/pull/35671#discussion_r818388291



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -133,6 +133,33 @@ object Size {
   def apply(child: Expression): Size = new Size(child)
 }
 
+
+/**
+ * Given an array, returns total number of elements in it.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the size of an array. The function returns null for null input.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
+       4
+  """,
+  since = "3.3.0",
+  group = "collection_funcs")
+case class ArraySize(expr: Expression)

Review comment:
       ```suggestion
   case class ArraySize(child: Expression)
   ```
   then we don't need to add `def child`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35671:
URL: https://github.com/apache/spark/pull/35671#issuecomment-1057664062


   There are compilation errors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org